1200字范文 > Python自然语言处理学习笔记(32)：4.4 函数：结构化编程的基础

Python自然语言处理学习笔记(32)：4.4 函数：结构化编程的基础

时间：2022-12-09 17:11:29

4.4Functions: The Foundation of Structured Programming

函数：结构化编程的基础

Functions provide an effective way to package and re-use program code, as already explained inSection 2.3. For example, suppose we find that we often want to read text from an HTML file. This involves several steps: opening the file, reading it in, normalizing whitespace, and stripping HTML markup. We can collect these steps into a function, and give it a name such asget_text(), as shown inExample 4.2.

Example 4.2 (code_get_text.py): Read text from a file

Now, any time we want to get cleaned-up text from an HTML file, we can just callget_text()with the name of the file as its only argument. It will return a string, and we can assign this to a variable, e.g.:contents = get_text("test.html"). Each time we want to use this series of steps we only have to call the function.

Using functions has the benefit of saving space in our program. More importantly, our choice of name for the function helps make the programreadable. In the case of the above example, whenever our program needs to read cleaned-up text from a file we don't have to clutter the program with four lines of code, we simply need to call get_text(). This naming helps to provide some "semantic interpretation" — it helps a reader of our program to see what the program "means".

Notice that the above function definition contains a string. The first string inside a function definition is called adocstring（文档字符串）. Not only does it document the purpose of the function to someone reading the code, it is accessible to a programmer who has loaded the code from a file:

>>>help(get_text)

Helponfunctionget_text:

get_text(file)

Readtextfromafile,normalizingwhitespace

andstrippingHTMLmarkup.

We have seen that functions help to make our work reusable and readable. They also help make itreliable（可靠的）. When we re-use code that has already been developed and tested, we can be more confident that it handles a variety of cases correctly. We also remove the risk that we forget some important step, or introduce a bug. The program that calls our function also has increased reliability. The author of that program is dealing with a shorter program, and its components behave transparently.

To summarize（简而言之）, as its name suggests, a function captures functionality. It is a segment of code that can be given a meaningful name and which performs a well-defined task. Functions allow us to abstract away from the details, to see a bigger picture, and to program more effectively.

The rest of this section takes a closer look at functions, exploring the mechanics and discussing ways to make your programs easier to read（这节的其余部分进一步研究函数，探索其机制和讨论使得你的程序更易读的方式）.

Function Inputs and Outputs函数输入和输出

We pass information to functions using a function's parameters, the parenthesized list of variables and constants following the function's name in the function definition. Here's a complete example:

We first define the function to take two parameters,msgandnum. Then we call the function and pass it two arguments,montyand3; these arguments fill the "placeholders" provide by the parameters and provide values for the occurrences ofmsgandnumin the function body.

It is not necessary to have any parameters, as we see in the following example:

A function usually communicates its results back to the calling program via thereturnstatement, as we have just seen. To the calling program, it looks as if the function call had been replaced with the function's result, e.g.:

A Python function is not required to have a return statement. Some functions do their work as a side effect, printing a result, modifying a file, or updating the contents of a parameter to the function (such functions are called "procedures" in some other programming languages).

Consider the following three sort functions. The third one is dangerous because a programmer could use it without realizing that it had modified its input(它修改了输入). In general, functions should modify the contents of a parameter (my_sort1()), or return a value (my_sort2()), not both (my_sort3()).

Parameter Passing传参

Back inSection 4.1you saw that assignment works on values, but that the value of a structured object is areferenceto that object. The same is true for functions. Python interprets function parameters as values (this is known ascall-by-value). In the following code,set_up()has two parameters, both of which are modified inside the function. We begin by assigning an empty string towand an empty list top. After calling the function,wis unchanged, whilepis changed:

Notice thatwwas not changed by the function. When we calledset_up(w, p), the value ofw(an empty string) was assigned to a new variableword. Inside the function, the value ofwordwas modified. However, that change did not propagate tow. This parameter passing is identical to（与相同） the following sequence of assignments:

Let's look at what happened with the listp. When we calledset_up(w, p), the value ofp(a reference to an empty list) was assigned to a new local variableproperties, so both variables now reference the same memory location. The function modifiesproperties, and this change is also reflected in the value ofpas we saw. The function also assigned a new value to properties (the number5); this did not modify the contents at that memory location, but created a new local variable（这没有改变该内存位置上的内容，而是创建了一个新局部变量）. This behavior is just as if we had done the following sequence of assignments:

Thus, to understand Python's call-by-value parameter passing, it is enough to understand how assignment works. Remember that you can use theid()function andis operator to check your understanding of object identity after each statement.

Variable Scope变量范围

Function definitions create a new, localscopefor variables. When you assign to a new variable inside the body of a function, the name is only defined within that function. The name is not visible outside the function, or in other functions. This behavior means you can choose variable names without being concerned about collisions with names used in your other function definitions.

When you refer to an existing name from within the body of a function, the Python interpreter first tries to resolve the name with respect to the names that are local to the function. If nothing is found, the interpreter checks if it is a global name within the module. Finally, if that does not succeed, the interpreter checks if the name is a Python built-in. This is the so-calledLGB ruleof name resolution: local, then global, then built-in（这就是所谓的名称解析的LGB规则：局部，然后全局，最后内建）.

Caution!

A function can create a new global variable, using theglobaldeclaration. However, this practice should be avoided as much as possible. Defining global variables inside a function introduces dependencies on context and limits the portability (or reusability) of the function. In general you should use parameters for function inputs and return values for function outputs.

Checking Parameter Types检测参数类型

Python does not force us to declare the type of a variable when we write a program, and this permits us to define functions that are flexible about the type of their arguments. For example, a tagger might expect a sequence of words, but it wouldn't care whether this sequence is expressed as a list, a tuple, or an iterator (a new sequence type that we'll discuss below).

However, often we want to write programs for later use by others, and want to program in a defensive style, providing useful warnings when functions have not been invoked correctly. The author of the followingtag()function assumed that its argument would always be a string.

The function returns sensible values for the arguments'the'and'knight', but look what happens when it is passed a list①— it fails to complain, even though the result which it returns is clearly incorrect. The author of this function could take some extra steps to ensure that thewordparameter of thetag()function is a string. A naive approach would be to check the type of the argument usingifnottype(word)isstr, and ifwordis not a string, to simply return Python's special empty value,None. This is a slight improvement, because the function is checking the type of the argument, and trying to return a "special", diagnostic value for the wrong input. However, it is also dangerous because the calling program may not detect thatNoneis intended as a "special" value, and this diagnostic return value may then be propagated to other parts of the program with unpredictable consequences. This approach also fails if the word is a Unicode string, which has typeunicode, notstr. Here's a better solution, using anassertstatement together with Python'sbasestringtype that generalizes over bothunicodeandstr.（我记得在Python美味食谱里提到过如何判断输入是否为字符串类型）

If theassertstatement fails, it will produce an error that cannot be ignored, since it halts program execution. Additionally, the error message is easy to interpret. Adding assertions to a program helps you find logical errors, and is a kind ofdefensive programming（防御性编程）. A more fundamental approach is to document the parameters to each function using docstrings as described later in this section.

Functional Decomposition功能分解

Well-structured programs usually make extensive use of functions. When a block of program code grows longer than 10-20 lines, it is a great help to readability if the code is broken up into one or more functions, each one having a clear purpose. This is analogous to（类似于） the way a good essay is divided into paragraphs, each expressing one main idea.

Functions provide an important kind of abstraction. They allow us to group multiple actions into a single, complex action, and associate a name with it. (Compare this with the way we combine the actions ofgoandbring backinto a single more complex actionfetch.) When we use functions, the main program can be written at a higher level of abstraction, making its structure transparent, e.g.

Appropriate use of functions makes programs more readable and maintainable. Additionally, it becomes possible to reimplement（重新实现？） a function — replacing the function's body with more efficient code — without having to be concerned with the rest of the program.

Consider thefreq_wordsfunction inExample 4.3. It updates the contents of a frequency distribution that is passed in as a parameter, and it also prints a list of thenmost frequent words.

Example 4.3 (code_freq_words1.py): Poorly Designed Function to Compute Frequent Words

This function has a number of problems. The function has two side-effects（副作用）: it modifies the contents of its second parameter, and it prints a selection of the results it has computed. The function would be easier to understand and to reuse elsewhere if we initialize theFreqDist()object inside the function (in the same place it is populated), and if we moved the selection and display of results to the calling program. InExample 4.4werefactor（重构）this function, and simplify its interface by providing a singleurlparameter.

Example 4.4 (code_freq_words2.py):Figure 4.4: Well-Designed Function to Compute Frequent Words

Note that we have now simplified the work offreq_wordsto the point that we can do its work with three lines of code:

Documenting Functions文档说明函数

If we have done a good job at decomposing our program into functions, then it should be easy to describe the purpose of each function in plain language, and provide this in the docstring at the top of the function definition. This statement should not explain how the functionality is implemented; in fact it should be possible to re-implement the function using a different method without changing this statement.

For the simplest functions, a one-line docstring is usually adequate (seeExample 4.2). You should provide a triple-quoted string containing a complete sentence on a single line. For non-trivial functions, you should still provide a one sentence summary on the first line, since many docstring processing tools index this string. This should be followed by a blank line, then a more detailed description of the functionality (see/dev/peps/pep-0257/ for more information in docstring conventions).

Docstrings can include adoctest block, illustrating the use of the function and the expected output（说明函数的使用和期待输出）. These can be tested automatically using Python'sdocutils module. Docstrings should document the type of each parameter to the function, and the return type. At a minimum, that can be done in plain text. However, note that NLTK uses the "epytext" markup language to document parameters. This format can be automatically converted into richly structured API documentation (see/), and includes special handling of certain "fields" such as@paramwhich allow the inputs and outputs of functions to be clearly documented.Example 4.5illustrates a complete docstring.

Example 4.5 (code_epytext.py): Illustration of a complete docstring, consisting of a one-line summary, a more detailed explanation, a doctest example, and epytext markup specifying the parameters, types, return type, and exceptions.

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。