块 (编程)
在计算机编程中,块(block)或译为程序区块、代码块,是将源代码组织在一起的词法结构。块构成自一个或多个声明和语句。编程语言允许创建块,包括嵌入其他块之内的块,就叫做块结构编程语言。块和子程序是结构化编程的基础,结构化所强调的控制结构可以用块来形成的。
在编程中块的功能,是确使成组的语句被当作如同就是一个语句,限定在一个块中声明的对象如变量、过程和函数的词法作用域,使得它们不冲突于在其他地方用到的同名者。在块结构编程语言中,在块外部的对象名字在块内部是可见的,除非它们被声明了相同名字的对象所遮掩。
历史
块结构的想法是在1950年代开发最初的Autocode期间发展出来的,并形式化于ALGOL 60报告中。ALGOL 58介入了“复合”(compound)语句的概念,它只与控制流程有关[1]。在“ALGOL 60报告”中,介入了块和作用域的概念[2]。最终在“修订报告”中,复合语句被定义为:包围在语句括号begin
和end
之间的成序列的语句,形成一个复合语句。块被定义为:成序列的声明,跟随着成序列的语句,并被包围在begin
和end
之间,形成一个块;所有声明以这种方式出现在一个块中,并只在这个块中有效[3]。块与复合语句的主要差异是不能从块外跳转到块内的标签[4]。
语法
块在不同语言家族中使用不同的语法:
- ALGOL语言家族,ALGOL 60及其后继者比如Simula,使用语句括号
begin
和end
来界定复合语句和块。ALGOL 68成为了面向表达式编程语言,偏好使用与begin
和end
等价的圆括号(
和)
[5]。 - Lisp语言家族,Lisp 1.5使用具有语法关键字
prog
的S-表达式表示块[9],而Maclisp和Scheme使用let
形式的S-表达式来表示块[10],S-表达式是圆括号(
和)
包围的前缀表示法。 - Smalltalk语言家族,Smalltalk-80和Self使用方括号
[
和]
来界定块。
此外,复合语句界定还可以采用:
建立控制结构,除了将所控制的语句序列,包围入复合语句或匿名块之外,还可以采用其他语法机制:
- 在ALGOL 68中,条件和迭代语句,使用块首保留字的反写保留字来终止,比如:
IF ~ THEN ~ ELIF ~ THEN ~ ELSE ~ FI
和FOR ~ FROM ~ TO ~ BY ~ WHILE ~ DO ~ OD
。继承此风格的有:Dijkstra的守卫命令语言和Bourne的Bourne shell等。 - 一些结构化编程语言,如FORTRAN 77、Modula-2、Ada和Visual Basic等,对控制结构加结束关键字,比如Modula-2中的:
IF ~ THEN ~ ELSIF ~ THEN ~ ELSE ~ END
和FOR ~ TO ~ BY ~ DO ~ END
。
限制
受ALGOL影响的一些语言支持块,但有着各自的限制:
基本语义
块的语义是双重的。首先,它向编程者提供了建立任意大和复杂的结构,并把它当作一个单元的一种途径。其次,它确使编程者能限制变量的作用域,有时可以限制已经被声明了的其他对象的作用域。
在早期语言比如FORTRAN和BASIC中,没有语句块或控制结构。直到1978年标准化FORTRAN 77之前,都没有“块状IF
”语句,要实现按条件选择,必须诉诸GOTO
语句。例如下述FORTRAN代码片段,从雇员工资中分别扣除超出正税阈值部分的税款,和超出附加税阈值部分的附加税款:
C 语言:ANSI标准FORTRAN 66
C 初始化要计算的值
PAYSTX = .FALSE.
PAYSST = .FALSE.
TAX = 0.0
SUPTAX = 0.0
C 如果雇员挣钱小于等于正税阈值则跃过税款扣除
IF (WAGES .LE. TAXTHR) GOTO 10
PAYSTX = .TRUE.
TAX = (WAGES - TAXTHR) * BASCRT
10 CONTINUE
C 如果雇员挣钱小于等于附加税阈值则跃过附加税扣除
IF (WAGES .LE. SUPTHR) GOTO 20
PAYSST = .TRUE.
SUPTAX = (WAGES - SUPTHR) * SUPRAT
20 CONTINUE
TAXED = WAGES - TAX - SUPTAX
程序的逻辑结构不反映在代码中,这里的初始化的值,是后面的有关逻辑判断为假时所应当设置的值。
块允许编程者把一组语句当作一个单元。例如,在与上述FORTRAN代码相对应的Pascal代码片段:
{ 语言:Jensen与Wirth版标准Pascal }
if Wages > TaxThreshold then
begin
PaysTax := true;
Tax := (Wages - TaxThreshold) * TaxRate
end
else begin
PaysTax := false;
Tax := 0
end;
if Wages > SupertaxThreshold then
begin
PaysSupertax := true;
Supertax := (Wages - SupertaxThreshold) * SupertaxRate
end
else begin
PaysSupertax := false;
Supertax := 0
end;
Taxed := Wages - Tax - Supertax;
与上述FORTRAN代码相比,上例中出现在初始化中的那些缺省值,通过复合语句即不带声明的块结构,被分别放置作出有关逻辑判断的地方。使用块结构,能明晰编程者的意图,使代码的结构更加密切反映出编程者的思考;再凭借某种风格的缩进和驼峰式大小写增进可读性,可使代码更加容易理解和修改。
在早期语言中,在子例程中变量的作用域遍及整个子例程。假想在一个Fortran子例程中,完成了与管理者有关的任务,这里可能用到叫做IEMPNO
的一个整数变量,指示作为管理者的雇员的社会安全号码(SSN);后来在这个子例程的维护工作中,又增加与下属们有关的任务,此时编程者可能不经意间使用同名变量IEMPNO
,指示了作为这个管理者的下属的雇员的SSN,这就会导致一个难于跟踪的缺陷。
块结构使得编程者能够容易地将作用域控制到细微级别。例如完成有关雇员任务的Scheme代码片段:
;; 语言:R5RS标准Scheme
(let ((empno (ssn-of employee-name)))
(when (is-manager? empno) ;; when已列入R7RS-small标准
(let ((employee-list (underlings-of empno)))
(display
;; format是SRFI-28和SRFI-48规定的字符串格式化过程
(format "~a has ~a employees working under him:~%"
employee-name (length employee-list)))
(for-each
(lambda (empno)
(display
(format "Name: ~a, role: ~a~%"
(name-of empno) (role-of empno))))
employee-list))))
这里在外层通过绑定宏let
将管理者的SSN绑定到了局部变量empno
,在其形成的块的作用域中列出管理者的雇员名字和他的下属数目;随后通过for-each
高阶函数,将他所有下属的SSN逐个绑定到匿名函数lambda
的形式参数empno
上,执行此匿名函数列出这个下属的名字和角色;这个形式参数的作用域是此匿名函数的主体,它与其外层的局部变量,标识符重名但不相互影响。在实践中,出于清晰性的考虑,编程者更可能选取明显不同的变量名字,但是即使名字选取存在重复,也难以在不经意间介入一个缺陷。在基于S-表达式的语言中,经常见到大量的嵌套圆括号,故而其代码必须采用良好的缩进。
提升
在一些语言中,变量可以声明为有函数作用域即使它位于函数的内嵌块之中。例如在JavaScript中,变量应当总是在使用之前被声明,它曾经允许赋值到未声明变量,会为此建立为未声明的全局变量,这在strict
模态下是个错误。以var
声明的变量有函数作用域,而非以let
或const
声明的变量可从属的块作用域。以var
声明的变量会被提升(hoist),这意味着可以在这个函数的作用域内任何地方提及这个变量,即使还未触及到它的声明,从而可以将var
声明视为被提举(lift)到它所在函数的顶部或全局作用域。但是如果在其声明之前访问了一个变量,这个变量的值总是未指定的。
参见
引用
- ^ Perlis, A. J.; Samelson, K. Preliminary report: international algebraic language (PDF). Communications of the ACM (New York, NY, USA: ACM). 1958, 1 (12): 8–22 [2023-02-20]. doi:10.1145/377924.594925. (原始内容存档 (PDF)于2023-02-20).
Strings of one or more statements may be combined into a single (compound) statement by enclosing them within the "statement parentheses"
begin
andend
. Single statements are separated by the statement separator ";
". - ^ John Backus; Friedrich L. Bauer; J. Green; C. Katz; John McCarthy; Alan Jay Perlis; Heinz Rutishauser; K. Samelson; B. Vauquois; J. H. Wegstein; A. van Wijngaarden; M. Woodger. Peter Naur , 编. Report on the Algorithmic Language ALGOL 60 (PDF) 3 (5). New York, NY, USA: ACM: 299–314. May 1960 [2009-10-27]. ISSN 0001-0782. doi:10.1145/367236.367262. (原始内容存档 (PDF)于2022-12-13).
Sequences of statements may be combined into compound statements by insertion of statement brackets. ……
Each declaration is attached to and valid for one compound statement. A compound statement which includes declarations is called a block. - ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 编. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始内容存档于2023-02-20).
A sequence of statements may be enclosed between the statement brackets
begin
andend
to form a compound statement. ……
A sequence of declarations followed by a sequence of statements and enclosed betweenbegin
andend
constitutes a block. Every declaration appears in a block in this way and is valid only for that block. - ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 编. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始内容存档于2023-02-20).
Since labels are inherently local, no go to statement can lead from outside into a block. A go to statement may, however, lead from outside into a compound statement.
- ^ A. van Wijngaarden, B. J. Mailloux, J. E. L. Peck, C. H. A. Koster, M. Sintzoff, C. H. Lindsey, L. G. L.T. Meertens and R. G. Fisker. Revised Report on the Algorithmic Language Algol 68. IFIP W.G. 2.1. [2023-02-20]. (原始内容存档于2020-07-11).
The ALGOL 60 concepts of block, compound statement and parenthesized expression are unified in ALGOL 68 into the serial-clause. A serial-clause may be an expression and yield a value. ……
A serial-clause consists of a possibly empty sequence of unlabelled phrases, the last of which, if any, is a declaration, followed by a sequence of possibly labelled units. The phrases and the units are separated by go-on-tokens, viz., semicolons. Some of the units may instead be separated by completers, viz.,EXIT
s; after a completer, the next unit must be labelled so that it can be reached. The value of the final unit, or of a unit preceding anEXIT
, determines the value of the serial-clause. - ^ 6.0 6.1 Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始内容存档 (PDF)于2023-02-20).
The program is divided into a heading and a body, called a block. The heading gives the program a name and lists its parameters. …… The block consists of six sections, where any except the last may be empty. They must appear in the order given in the definition for a block:
Block =
LabeLDeclarationPart
ConstantDefinitionPart
TypeDefinitionPart
VariableDeclarationPart
ProcedureAndFunctionDeclarationPart
StatementPart.
……
Each procedure and function declaration has a structure similar to a program; i.e. , each consists of a heading and a block. ……
The compound statement is that of Algol, and corresponds to the DO group in PL/I. ……
The "block structure" differs from that of Algol and PL/I insofar as there are no anonymous blocks; i.e., each block is given a name and thereby is made into a procedure or function. - ^ Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始内容存档 (PDF)于2023-02-20).
The compound statement specifies that its component statements be executed in the same sequence as they are written. The symbols
begin
andend
act as statement brackets. ……
Pascal uses the semicolon to separate statements, not to terminate statements; i.e., the semicolon is not part of the statement. - ^ 8.0 8.1
Brian Kernighan, Dennis Ritchie. The C Programming Language, Second Edition (PDF). Prentice Hall. 1988.
In C, the semicolon is a statement terminator, rather than a separator as it is in languages like Pascal.
Braces{
and}
are used to group declarations and statements together into a compound statement, or block, so that they are syntactically equivalent to a single statement. The braces that surround the statements of a function are one obvious example; braces around multiple statements after anif
,else
,while
, orfor
are another. (Variables can be declared inside any block; ……) There is no semicolon after the right brace that ends a block. ……
A label has the same form as a variable name, and is followed by a colon. It can be attached to any statement in the same function as thegoto
. The scope of a label is the entire function. ……
C is not a block-structured language in the sense of Pascal or similar languages, because functions may not be defined within other functions. On the other hand, variables can be defined in a block-structured fashion within a function. Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function. Variables declared in this way hide any identically named variables in outer blocks, and remain in existence until the matching right brace. ……
An automatic variable declared and initialized in a block is initialized each time the block is entered.
Automatic variables, including formal parameters, also hide external variables and functions of the same name. - ^
John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart, Michael I. Levin. LISP 1.5 Programmer's Manual (PDF) 2nd. MIT Press. 1985 [1962] [2021-09-23]. ISBN 0-262-13011-4. (原始内容 (PDF)存档于2021-03-02).
The LISP 1.5 program feature allows the user to write an Algol-like program containing LISP statements to be executed. ……
The program form has the structure - (PROG
, list of program variables, sequence of statements and atomic symbols...) An atomic symbol in the list is the location marker for the statement that follows. - ^
Kent M. Pitman. The Revised Maclisp Manual. 1983, 2007 [2021-10-14]. (原始内容存档于2021-12-21).
LET
is used to bind some variables to some objects, and then to evaluate some forms (those which make up the body) in the context of those bindings. ……LET*
Same asLET
but does bindings in sequence instead of in parallel.