块 (编程)

计算机编程中,(block)或译为程序区块代码块,是将源代码组织在一起的词法结构。块构成自一个或多个声明英语Declaration (computer programming)语句。编程语言允许创建块,包括嵌入其他块之内的块,就叫做块结构编程语言。块和子程序结构化编程的基础,结构化所强调的控制结构可以用块来形成的。

在编程中块的功能,是确使成组的语句被当作如同就是一个语句,限定在一个块中声明的对象如变量、过程和函数的词法作用域,使得它们不冲突于在其他地方用到的同名者。在块结构编程语言中,在块外部的对象名字在块内部是可见的,除非它们被声明了相同名字的对象所遮掩

历史

块结构的想法是在1950年代开发最初的Autocode英语Autocode期间发展出来的,并形式化于ALGOL 60报告中。ALGOL 58介入了“复合”(compound)语句的概念,它只与控制流程有关[1]。在“ALGOL 60报告”中,介入了块和作用域的概念[2]。最终在“修订报告”中,复合语句被定义为:包围在语句括号beginend之间的成序列的语句,形成一个复合语句。块被定义为:成序列的声明,跟随着成序列的语句,并被包围在beginend之间,形成一个块;所有声明以这种方式出现在一个块中,并只在这个块中有效[3]。块与复合语句的主要差异是不能从块外跳转到块内的标签[4]

语法

块在不同语言家族中使用不同的语法:

此外,复合语句界定还可以采用:

建立控制结构,除了将所控制的语句序列,包围入复合语句或匿名块之外,还可以采用其他语法机制:

限制

受ALGOL影响的一些语言支持块,但有着各自的限制:

  • C家族语言,在块和复合语句之中不仅支持嵌套入复合语句,还支持嵌入带有声明的匿名块,但不允许声明嵌套函数英语nested function[8]
  • Pascal家族语言,在语句部分的复合语句之中,不允许存在带有声明的匿名块[6],只支持复合语句,用来在ifwhilerepeat等控制语句内组合语句序列。

基本语义

块的语义是双重的。首先,它向编程者提供了建立任意大和复杂的结构,并把它当作一个单元的一种途径。其次,它确使编程者能限制变量的作用域,有时可以限制已经被声明了的其他对象的作用域。

在早期语言比如FORTRANBASIC中,没有语句块或控制结构。直到1978年标准化FORTRAN 77之前,都没有“块状IF”语句,要实现按条件选择,必须诉诸GOTO语句。例如下述FORTRAN代码片段,从雇员工资中分别扣除超出正税阈值部分的税款,和超出附加税阈值部分的附加税款:

C     语言:ANSI标准FORTRAN 66
C     初始化要计算的值
      PAYSTX = .FALSE.
      PAYSST = .FALSE.
      TAX = 0.0
      SUPTAX = 0.0
C     如果雇员挣钱小于等于正税阈值则跃过税款扣除
      IF (WAGES .LE. TAXTHR) GOTO 10
        PAYSTX = .TRUE.
        TAX = (WAGES - TAXTHR) * BASCRT
   10 CONTINUE
C     如果雇员挣钱小于等于附加税阈值则跃过附加税扣除
      IF (WAGES .LE. SUPTHR) GOTO 20
        PAYSST = .TRUE.
        SUPTAX = (WAGES - SUPTHR) * SUPRAT
   20 CONTINUE
      TAXED = WAGES - TAX - SUPTAX

程序的逻辑结构不反映在代码中,这里的初始化的值,是后面的有关逻辑判断为假时所应当设置的值。

块允许编程者把一组语句当作一个单元。例如,在与上述FORTRAN代码相对应的Pascal代码片段:

{ 语言:Jensen与Wirth版标准Pascal }
if Wages > TaxThreshold then
begin
    PaysTax := true;
    Tax := (Wages - TaxThreshold) * TaxRate
end
else begin
    PaysTax := false;
    Tax := 0
end;
if Wages > SupertaxThreshold then
begin
    PaysSupertax := true;
    Supertax := (Wages - SupertaxThreshold) * SupertaxRate
end
else begin
    PaysSupertax := false;
    Supertax := 0
end;
Taxed := Wages - Tax - Supertax;

与上述FORTRAN代码相比,上例中出现在初始化中的那些缺省值,通过复合语句即不带声明的块结构,被分别放置作出有关逻辑判断的地方。使用块结构,能明晰编程者的意图,使代码的结构更加密切反映出编程者的思考;再凭借某种风格的缩进驼峰式大小写增进可读性,可使代码更加容易理解和修改。

在早期语言中,在子例程中变量的作用域遍及整个子例程。假想在一个Fortran子例程中,完成了与管理者有关的任务,这里可能用到叫做IEMPNO的一个整数变量,指示作为管理者的雇员的社会安全号码(SSN);后来在这个子例程的维护工作中,又增加与下属们有关的任务,此时编程者可能不经意间使用同名变量IEMPNO,指示了作为这个管理者的下属的雇员的SSN,这就会导致一个难于跟踪的缺陷。

块结构使得编程者能够容易地将作用域控制到细微级别。例如完成有关雇员任务的Scheme代码片段:

;; 语言:R5RS标准Scheme
(let ((empno (ssn-of employee-name)))
  (when (is-manager? empno) ;; when已列入R7RS-small标准
    (let ((employee-list (underlings-of empno)))
      (display
        ;; format是SRFI-28和SRFI-48规定的字符串格式化过程
        (format "~a has ~a employees working under him:~%"
          employee-name (length employee-list)))
      (for-each
        (lambda (empno)
          (display
            (format "Name: ~a, role: ~a~%"
              (name-of empno) (role-of empno))))
        employee-list))))

这里在外层通过绑定let将管理者的SSN绑定到了局部变量empno,在其形成的块的作用域中列出管理者的雇员名字和他的下属数目;随后通过for-each高阶函数,将他所有下属的SSN逐个绑定到匿名函数lambda的形式参数empno上,执行此匿名函数列出这个下属的名字和角色;这个形式参数的作用域是此匿名函数的主体,它与其外层的局部变量,标识符重名但不相互影响。在实践中,出于清晰性的考虑,编程者更可能选取明显不同的变量名字,但是即使名字选取存在重复,也难以在不经意间介入一个缺陷。在基于S-表达式的语言中,经常见到大量的嵌套圆括号,故而其代码必须采用良好的缩进

提升

在一些语言中,变量可以声明为有函数作用域即使它位于函数的内嵌块之中。例如在JavaScript中,变量应当总是在使用之前被声明,它曾经允许赋值到未声明变量,会为此建立为未声明的全局变量,这在strict模态下是个错误。以var声明的变量有函数作用域,而非以letconst声明的变量可从属的块作用域。以var声明的变量会被提升(hoist),这意味着可以在这个函数的作用域内任何地方提及这个变量,即使还未触及到它的声明,从而可以将var声明视为被提举(lift)到它所在函数的顶部或全局作用域。但是如果在其声明之前访问了一个变量,这个变量的值总是未指定的。

参见

引用

  1. ^ Perlis, A. J.; Samelson, K. Preliminary report: international algebraic language (PDF). Communications of the ACM (New York, NY, USA: ACM). 1958, 1 (12): 8–22 [2023-02-20]. doi:10.1145/377924.594925. (原始内容存档 (PDF)于2023-02-20). Strings of one or more statements may be combined into a single (compound) statement by enclosing them within the "statement parentheses" begin and end. Single statements are separated by the statement separator ";". 
  2. ^ John Backus; Friedrich L. Bauer; J. Green; C. Katz; John McCarthy; Alan Jay Perlis; Heinz Rutishauser; K. Samelson; B. Vauquois; J. H. Wegstein; A. van Wijngaarden; M. Woodger. Peter Naur , 编. Report on the Algorithmic Language ALGOL 60 (PDF) 3 (5). New York, NY, USA: ACM: 299–314. May 1960 [2009-10-27]. ISSN 0001-0782. doi:10.1145/367236.367262. (原始内容存档 (PDF)于2022-12-13). Sequences of statements may be combined into compound statements by insertion of statement brackets. ……
    Each declaration is attached to and valid for one compound statement. A compound statement which includes declarations is called a block.
     
  3. ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 编. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始内容存档于2023-02-20). A sequence of statements may be enclosed between the statement brackets begin and end to form a compound statement. ……
    A sequence of declarations followed by a sequence of statements and enclosed between begin and end constitutes a block. Every declaration appears in a block in this way and is valid only for that block.
     
  4. ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 编. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始内容存档于2023-02-20). Since labels are inherently local, no go to statement can lead from outside into a block. A go to statement may, however, lead from outside into a compound statement. 
  5. ^ A. van Wijngaarden, B. J. Mailloux, J. E. L. Peck, C. H. A. Koster, M. Sintzoff, C. H. Lindsey, L. G. L.T. Meertens and R. G. Fisker. Revised Report on the Algorithmic Language Algol 68. IFIP W.G. 2.1. [2023-02-20]. (原始内容存档于2020-07-11). The ALGOL 60 concepts of block, compound statement and parenthesized expression are unified in ALGOL 68 into the serial-clause. A serial-clause may be an expression and yield a value. ……
    A serial-clause consists of a possibly empty sequence of unlabelled phrases, the last of which, if any, is a declaration, followed by a sequence of possibly labelled units. The phrases and the units are separated by go-on-tokens, viz., semicolons. Some of the units may instead be separated by completers, viz., EXITs; after a completer, the next unit must be labelled so that it can be reached. The value of the final unit, or of a unit preceding an EXIT, determines the value of the serial-clause.
     
  6. ^ 6.0 6.1 Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始内容存档 (PDF)于2023-02-20). The program is divided into a heading and a body, called a block. The heading gives the program a name and lists its parameters. …… The block consists of six sections, where any except the last may be empty. They must appear in the order given in the definition for a block:
    Block =
        LabeLDeclarationPart
        ConstantDefinitionPart
        TypeDefinitionPart
        VariableDeclarationPart
        ProcedureAndFunctionDeclarationPart
        StatementPart.
    ……
    Each procedure and function declaration has a structure similar to a program; i.e. , each consists of a heading and a block. ……
    The compound statement is that of Algol, and corresponds to the DO group in PL/I. ……
    The "block structure" differs from that of Algol and PL/I insofar as there are no anonymous blocks; i.e., each block is given a name and thereby is made into a procedure or function.
     
  7. ^ Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始内容存档 (PDF)于2023-02-20). The compound statement specifies that its component statements be executed in the same sequence as they are written. The symbols begin and end act as statement brackets. ……
    Pascal uses the semicolon to separate statements, not to terminate statements; i.e., the semicolon is not part of the statement.
     
  8. ^ 8.0 8.1 Brian Kernighan, Dennis Ritchie. The C Programming Language, Second Edition (PDF). Prentice Hall. 1988. In C, the semicolon is a statement terminator, rather than a separator as it is in languages like Pascal.
    Braces { and } are used to group declarations and statements together into a compound statement, or block, so that they are syntactically equivalent to a single statement. The braces that surround the statements of a function are one obvious example; braces around multiple statements after an if, else, while, or for are another. (Variables can be declared inside any block; ……) There is no semicolon after the right brace that ends a block. ……
    A label has the same form as a variable name, and is followed by a colon. It can be attached to any statement in the same function as the goto. The scope of a label is the entire function. ……
    C is not a block-structured language in the sense of Pascal or similar languages, because functions may not be defined within other functions. On the other hand, variables can be defined in a block-structured fashion within a function. Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function. Variables declared in this way hide any identically named variables in outer blocks, and remain in existence until the matching right brace. ……
    An automatic variable declared and initialized in a block is initialized each time the block is entered.
    Automatic variables, including formal parameters, also hide external variables and functions of the same name.
     
  9. ^ John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart, Michael I. Levin. LISP 1.5 Programmer's Manual (PDF) 2nd. MIT Press. 1985 [1962] [2021-09-23]. ISBN 0-262-13011-4. (原始内容 (PDF)存档于2021-03-02). The LISP 1.5 program feature allows the user to write an Algol-like program containing LISP statements to be executed. ……
    The program form has the structure - (PROG, list of program variables, sequence of statements and atomic symbols...) An atomic symbol in the list is the location marker for the statement that follows.
     
  10. ^ Kent M. Pitman英语Kent Pitman. The Revised Maclisp Manual. 1983, 2007 [2021-10-14]. (原始内容存档于2021-12-21). LET is used to bind some variables to some objects, and then to evaluate some forms (those which make up the body) in the context of those bindings. ……
    LET* Same as LET but does bindings in sequence instead of in parallel.