Protobuf - 如何開發 codegen plugin

如果有在用 GRPC 方式在做傳輸的話,就一定會知道 Google 推出的 Protocol Buffers (Protobuf),其實它就是一種資料結構,現在 HTTP 傳輸大多使用 JSON 的格式來做資料傳輸,而 Protobuf 就是在做類似的事情。

因為 Protobuf 定義上的規格可以透過 codegen 的方式來產生相對應程式語言來使用,例如 proto 檔案我定義如下:

1
2
3
4
5
6
7
8
9
10
11
syntax = "proto3";

package example;

import "google/protobuf/timestamp.proto";

option go_package = "github.com/KennyChenFight/protogen-demo/proto";

message Hello {
string greeting = 1;
}

透過 codegen 的方式可以產生以下的程式碼檔案:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.25.0
// protoc v3.12.3
// source: proto/pb.proto

package proto

import (
proto "github.com/golang/protobuf/proto"
protoreflect "google.golang.org/protobuf/reflect/protoreflect"
protoimpl "google.golang.org/protobuf/runtime/protoimpl"
reflect "reflect"
sync "sync"
)

const (
// Verify that this generated code is sufficiently up-to-date.
_ = protoimpl.EnforceVersion(20 - protoimpl.MinVersion)
// Verify that runtime/protoimpl is sufficiently up-to-date.
_ = protoimpl.EnforceVersion(protoimpl.MaxVersion - 20)
)
...

最上面的 comment 可以注意它寫:Code generated by protoc-gen-go,因此可以知道 protoc-gen-go 本身就是 codegen 的工具,事實上這個工具來自於此:https://pkg.go.dev/github.com/golang/protobuf/protoc-gen-go

讓 protocol 也就是 protocol buffer compiler 可以使用這個工具來產生 Golang code,讓我們可以直接使用。

開發 protobuf codegen plugin 的方式

之所以要學習如何開發 protobuf codegen plugin,是因為有時候你會想要產生一些你想要的檔案出來,因為 protobuf 通常上面是定義你的 message,service 有哪些,其實就相當於你的 API 文件,所以你其實可以透過 codegen 的方式為你的 API 文件產生你想要的輔助檔案,好比說你想要根據 protobuf 檔案自動產生 Open API 文件來搭配類似 Swagger 框架來產生文檔,或者是產生 HTTP/GRPC client /server 的程式碼,讓你 codegen 完就能直接使用,可以省下你每次開新專案都要寫重複的 code 的問題。

事實上產生 GRPC 的 golang code 已經有了:https://github.com/grpc/grpc-go/blob/master/cmd/protoc-gen-go-grpc/main.go

這也是我們常常在開發 GPRC 所必要的 codegen plugin。因此以下來說說如果想自己開發 codegen 要怎麼做?再講怎麼開發之前想先講講 protoc 的運作 flow,才能更理解為什麼能運用我們的 plugin。

protoc flow

Protoc 其實就是 protobuf 的 compiler,使用方式如下:

1
2
3
4
protoc \
-I=. \
--myplugin_out="foo=bar:../generated" \
./pkg/*.proto
  • -I flag 可以使用多次,例如 -I=. -I=. 這樣,-I 代表的是可以 import 參考的 proto file 路徑,這樣假設我在我自己的 proto 檔案,有用到其他人的定義 proto 檔案,我就能透過 -I 來告訴 protoc 並且能夠正常 compile。

  • --myplugin_out 是讓 protoc 知道要使用 protoc-gen-myplugin plugin,而 foo=bar 代表我們可以傳進去的參數,這邊的參數可以多個,例如可以寫成這樣:foo=bar,foo2=bar2:../generated,用逗號來做隔開,後面的 ../generated 代表產生檔案的路徑。

  • ./pkg/*.proto 代表 protoc 要解析的 proto 檔案,會檢查語法是否正確並寫 import 所需要的 proto,他會將這些文件和 import 轉成叫做每個 File Descriptor,什麼是 File Descriptor ?其實來自於 descriptor.proto

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    message FileDescriptorProto {
    optional string name = 1; // file name, relative to root of source tree
    optional string package = 2; // e.g. "foo", "foo.bar", etc.

    // Names of files imported by this file.
    repeated string dependency = 3;
    // Indexes of the public imported files in the dependency list above.
    repeated int32 public_dependency = 10;
    // Indexes of the weak imported files in the dependency list.
    // For Google-internal migration only. Do not use.
    repeated int32 weak_dependency = 11;

    // All top-level definitions in this file.
    repeated DescriptorProto message_type = 4;
    repeated EnumDescriptorProto enum_type = 5;
    repeated ServiceDescriptorProto service = 6;
    repeated FieldDescriptorProto extension = 7;

    optional FileOptions options = 8;

    // This field contains optional information about the original source code.
    // You may safely remove this entire field without harming runtime
    // functionality of the descriptors -- the information is needed only by
    // development tools.
    optional SourceCodeInfo source_code_info = 9;

    // The syntax of the proto file.
    // The supported values are "proto2" and "proto3".
    optional string syntax = 12;
    }

    這是 descriptor.proto 裡面的其中一個 File Descriptor message,所謂的 FileDescriptorProto 就是描述這個 proto 檔案會有的規格有什麼。有點像是用 proto 在定義 proto 的意思。

    protoc 在產生完 descriptor 之後會在產生出一個 CodeGeneratorRequest,這個的定義也是來於 plugin.proto

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    // An encoded CodeGeneratorRequest is written to the plugin's stdin.
    message CodeGeneratorRequest {
    // The .proto files that were explicitly listed on the command-line. The
    // code generator should generate code only for these files. Each file's
    // descriptor will be included in proto_file, below.
    repeated string file_to_generate = 1;

    // The generator parameter passed on the command-line.
    optional string parameter = 2;

    // FileDescriptorProtos for all files in files_to_generate and everything
    // they import. The files will appear in topological order, so each file
    // appears before any file that imports it.
    //
    // protoc guarantees that all proto_files will be written after
    // the fields above, even though this is not technically guaranteed by the
    // protobuf wire format. This theoretically could allow a plugin to stream
    // in the FileDescriptorProtos and handle them one by one rather than read
    // the entire set into memory at once. However, as of this writing, this
    // is not similarly optimized on protoc's end -- it will store all fields in
    // memory at once before sending them to the plugin.
    //
    // Type names of fields and extensions in the FileDescriptorProto are always
    // fully qualified.
    repeated FileDescriptorProto proto_file = 15;

    // The version number of protocol compiler.
    optional Version compiler_version = 3;

    }

    上面的 comment 寫得很清楚,CodeGeneratorRequest 的內容是來自於 plugin 的 stdin,所以也就是說我們在 protoc 下的指令最終也是會被轉成 CodeGeneratorRequest 的內容。而這個 CodeGeneratorRequest 的內容會 pass 給我們的 plugin,這時候你的 plugin 就能對 CodeGeneratorRequest 的內容進行判斷並且做你想要做的事情,為什麼可以這樣,是因為 CodeGeneratorRequest 可以給你的資訊很多,例如你看上面有 repeated FileDescriptorProto proto_file 欄位,也就是你可以存取這邊拿到所有 proto 檔案的內容。

那麼統整一下,整個 codegen 的過程是怎麼進行的:

  1. 先定義你自己的 proto 檔案

  2. 透過 protoc 進行編譯並給予用 plugin 的 command line 指令

  3. protoc 產生 CodeGeneratorRequest 並 pass 給 protoc-gen-myplugin 接著產生 CodeGeneratorResponse

    CodeGeneratorResponse 的內容也是來自於上面的 plugin.proto:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    // The plugin writes an encoded CodeGeneratorResponse to stdout.
    message CodeGeneratorResponse {
    // Error message. If non-empty, code generation failed. The plugin process
    // should exit with status code zero even if it reports an error in this way.
    //
    // This should be used to indicate errors in .proto files which prevent the
    // code generator from generating correct code. Errors which indicate a
    // problem in protoc itself -- such as the input CodeGeneratorRequest being
    // unparseable -- should be reported by writing a message to stderr and
    // exiting with a non-zero status code.
    optional string error = 1;

    // A bitmask of supported features that the code generator supports.
    // This is a bitwise "or" of values from the Feature enum.
    optional uint64 supported_features = 2;

    // Sync with code_generator.h.
    enum Feature {
    FEATURE_NONE = 0;
    FEATURE_PROTO3_OPTIONAL = 1;
    }

    // Represents a single generated file.
    message File {
    // The file name, relative to the output directory. The name must not
    // contain "." or ".." components and must be relative, not be absolute (so,
    // the file cannot lie outside the output directory). "/" must be used as
    // the path separator, not "\".
    //
    // If the name is omitted, the content will be appended to the previous
    // file. This allows the generator to break large files into small chunks,
    // and allows the generated text to be streamed back to protoc so that large
    // files need not reside completely in memory at one time. Note that as of
    // this writing protoc does not optimize for this -- it will read the entire
    // CodeGeneratorResponse before writing files to disk.
    optional string name = 1;

    // If non-empty, indicates that the named file should already exist, and the
    // content here is to be inserted into that file at a defined insertion
    // point. This feature allows a code generator to extend the output
    // produced by another code generator. The original generator may provide
    // insertion points by placing special annotations in the file that look
    // like:
    // @@protoc_insertion_point(NAME)
    // The annotation can have arbitrary text before and after it on the line,
    // which allows it to be placed in a comment. NAME should be replaced with
    // an identifier naming the point -- this is what other generators will use
    // as the insertion_point. Code inserted at this point will be placed
    // immediately above the line containing the insertion point (thus multiple
    // insertions to the same point will come out in the order they were added).
    // The double-@ is intended to make it unlikely that the generated code
    // could contain things that look like insertion points by accident.
    //
    // For example, the C++ code generator places the following line in the
    // .pb.h files that it generates:
    // // @@protoc_insertion_point(namespace_scope)
    // This line appears within the scope of the file's package namespace, but
    // outside of any particular class. Another plugin can then specify the
    // insertion_point "namespace_scope" to generate additional classes or
    // other declarations that should be placed in this scope.
    //
    // Note that if the line containing the insertion point begins with
    // whitespace, the same whitespace will be added to every line of the
    // inserted text. This is useful for languages like Python, where
    // indentation matters. In these languages, the insertion point comment
    // should be indented the same amount as any inserted code will need to be
    // in order to work correctly in that context.
    //
    // The code generator that generates the initial file and the one which
    // inserts into it must both run as part of a single invocation of protoc.
    // Code generators are executed in the order in which they appear on the
    // command line.
    //
    // If |insertion_point| is present, |name| must also be present.
    optional string insertion_point = 2;

    // The file contents.
    optional string content = 15;

    // Information describing the file content being inserted. If an insertion
    // point is used, this information will be appropriately offset and inserted
    // into the code generation metadata for the generated files.
    optional GeneratedCodeInfo generated_code_info = 16;
    }
    repeated File file = 15;
    }

    可以看到裡面定義的欄位有 repeated File file 所以可以根據 CodeGeneratorResponse 產生出你的客製化檔案內容出來!

缺點

但是有個缺點如果你仔細去看 descriptor 的內容,會發現你要解析每個 proto 檔案裡面的內容會有點麻煩,因為 descriptor 的內容很雜會需要一個一個去解析,因為每一個 File descriptor 裡面可能會有 message descriptor,service descriptor 等等:

1
2
3
4
5
// All top-level definitions in this file.
repeated DescriptorProto message_type = 4;
repeated EnumDescriptorProto enum_type = 5;
repeated ServiceDescriptorProto service = 6;
repeated FieldDescriptorProto extension = 7;

所以在寫我們自己的 plugin 就需要每一個去看裡面的欄位有什麼怎麼拿出來,怎麼去做 parse 才能達到我們的目的。再來這個 descriptor.proto 本身是用 proto v2 的語法開發的,要知道現在都是用 proto v3 的版本來定義 proto,因此你在 parse 上又要能夠符合 proto v3 的版本就會顯得麻煩。

這時候 GitHub 就出現許多神人啦,會有些封裝好底層 Parse 的流程,呈現更好用的 API 來讓你進行 plugin 的開發。因此以下會介紹兩個 lib。

protoc-gen-star

個人覺得這個 lib 本身提供的功能是最好的,也是最複雜的,如果你的 plugin 需要很複雜的操作,那我會很建議使用這個 lib 來進行操作。

來看看 source code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Module describes the interface for a domain-specific code generation module
// that can be registered with the PG* generator.
type Module interface {
// The Name of the Module, used when establishing the build context and used
// as the base prefix for all debugger output.
Name() string

// InitContext is called on a Module with a pre-configured BuildContext that
// should be stored and used by the Module.
InitContext(c BuildContext)

// Execute is called on the module with the target Files as well as all
// loaded Packages from the gatherer. The module should return a slice of
// Artifacts that it would like to be generated.
Execute(targets map[string]File, packages map[string]Package) []Artifact
}

在這個 lib 中會有所謂 Module 的概念,這個概念可以等同於我們前面提到的 plugin 的意思,你自己設計的 plugin 必須滿足這個 Module interface 上面的所有 method,Name() 指的是這個 module 的名稱,也是方便你做 debug 之用,InitContext 指的是你這個 Module 進行 Execute 前的前置作業,例如在這邊你可以透過 BuildContext 拿到我們對 protoc 下的參數之類的,Execute 指的是拿到每一個 File (其實就是 proto file descriptor parse 的結果) 及 proto file 上面 packages 的資訊,當我們可以拿到資訊就可以在 Execute 裡面實作我們想要的邏輯,例如判斷 proto file 的資訊產生怎樣的檔案出來。

而作者這邊也提供了 ModuleBase 可以更方便我們做開發:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// ModuleBase provides utility methods and a base implementation for a
// protoc-gen-star Module. ModuleBase should be used as an anonymously embedded
// field of an actual Module implementation. The only methods that need to be
// overridden are Name and Execute.
//
// type MyModule {
// *pgs.ModuleBase
// }
//
// func InitMyModule() *MyModule { return &MyModule{ &pgs.ModuleBase{} } }
//
// func (m *MyModule) Name() string { return "MyModule" }
//
// func (m *MyModule) Execute(...) []pgs.Artifact { ... }
//
type ModuleBase struct {
BuildContext
artifacts []Artifact
}

ModuleBase 本身有結合 debug log 的作用並實作 Module interface,所以我們可以將 ModuleBase Embedded 我們自己的 Module Struct 並且可以選擇是否再去 Overrwirte 底下的實作。

來看 example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
package module

import (
"text/template"

pgsgo "github.com/lyft/protoc-gen-star/lang/go"

pgs "github.com/lyft/protoc-gen-star"
)

type JSONifyModule struct {
*pgs.ModuleBase
ctx pgsgo.Context
tpl *template.Template
}

func NewJSONifyModule() *JSONifyModule { return &JSONifyModule{ModuleBase: &pgs.ModuleBase{}} }

func (p *JSONifyModule) InitContext(c pgs.BuildContext) {
p.ModuleBase.InitContext(c)
p.ctx = pgsgo.InitContext(c.Parameters())

tpl := template.New("jsonify").Funcs(map[string]interface{}{
"package": p.ctx.PackageName,
"name": p.ctx.Name,
})

p.tpl = template.Must(tpl.Parse(jsonifyTpl))
}

func (p *JSONifyModule) Name() string { return "jsonify" }

func (p *JSONifyModule) Execute(targets map[string]pgs.File, pkgs map[string]pgs.Package) []pgs.Artifact {

for _, t := range targets {
p.generate(t)
}

return p.Artifacts()
}

func (p *JSONifyModule) generate(f pgs.File) {
if len(f.Messages()) == 0 {
return
}

name := p.ctx.OutputPath(f).SetBase("pb").SetExt("_demo2.json.go")
p.AddGeneratorTemplateFile(name.String(), p.tpl, f)
}

const jsonifyTpl = `package {{ package . }}

import "google.golang.org/protobuf/encoding/protojson"

{{ range .AllMessages }}

func (m *{{ name . }}) MarshalJSON() ([]byte, error) {
b, err := protojson.Marshal(m)
if err != nil {
return nil, err
}
return b, nil
}

var _ json.Marshaler = (*{{ name . }})(nil)

func (m *{{ name . }}) UnmarshalJSON(b []byte) error {
return protojson.Unmarshal(b, m)
}

var _ json.Unmarshaler = (*{{ name . }})(nil)

{{ end }}
`
  1. 這個 example 是我拿 protoc-gen-star 裡面的範例下來改的

  2. 這個 Module 叫做 JSONifyModule 主要是使用 protojson pkg 來進行正確的 protobuf message marshal 成 json 的工具

  3. 在 InitContext 時使用 ModuleBase InitContext 來進行 Debug log 的功能開啟,使用 pgsgo.InitContext 來拿到你所需要的資訊,如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    // Context resolves Go-specific language for Packages & Entities generated by
    // protoc-gen-go. The rules that drive the naming behavior are complicated, and
    // result from an interplay of the go_package file option, the proto package,
    // and the proto filename itself. Therefore, it is recommended that all proto
    // files that are targeting Go should include a fully qualified go_package
    // option. These must be consistent for all proto files that are intended to be
    // in the same Go package.
    type Context interface {
    // Params returns the Parameters associated with this context.
    Params() pgs.Parameters

    // Name returns the name of a Node as it would appear in the generation output
    // of protoc-gen-go. For each type, the following is returned:
    //
    // - Package: the Go package name
    // - File: the Go package name
    // - Message: the struct name
    // - Field: the field name on the Message struct
    // - OneOf: the field name on the Message struct
    // - Enum: the type name
    // - EnumValue: the constant name
    // - Service: the server interface name
    // - Method: the method name on the server and client interface
    //
    Name(node pgs.Node) pgs.Name

    // ServerName returns the name of the server interface for the Service.
    ServerName(service pgs.Service) pgs.Name

    // ClientName returns the name of the client interface for the Service.
    ClientName(service pgs.Service) pgs.Name

    // ServerStream returns the name of the grpc.ServerStream wrapper for this
    // method. This name is only used if client or server streaming is
    // implemented for this method.
    ServerStream(method pgs.Method) pgs.Name

    // OneofOption returns the struct name that wraps a OneOf option's value. These
    // messages contain one field, matching the value returned by Name for this
    // Field.
    OneofOption(field pgs.Field) pgs.Name

    // TypeName returns the type name of a Field as it would appear in the
    // generated message struct from protoc-gen-go. Fields from imported
    // packages will be prefixed with the package name.
    Type(field pgs.Field) TypeName

    // PackageName returns the name of the Node's package as it would appear in
    // Go source generated by the official protoc-gen-go plugin.
    PackageName(node pgs.Node) pgs.Name

    // ImportPath returns the Go import path for an entity as it would be
    // included in an import block in a Go file. This value is only appropriate
    // for Entities imported into a target file/package.
    ImportPath(entity pgs.Entity) pgs.FilePath

    // OutputPath returns the output path relative to the plugin's output destination
    OutputPath(entity pgs.Entity) pgs.FilePath
    }

    而 parse 方式其實是參考 protoc-gen-go 去做的,所以你可以在上面 comment 看到:

    1
    2
    // Context resolves Go-specific language for Packages & Entities generated by
    // protoc-gen-go. ...

    接著可以在 initContext 這邊先把你想產生的檔案的模板先 new 出來,方便在 execute 的時候使用

  4. 在 execute 的時候你就可以 for loop proto file 來進行你的 generate 實作,主要要注意的就是這兩行:

    1
    2
    name := p.ctx.OutputPath(f).SetBase("pb").SetExt("_demo2.json.go")
    p.AddGeneratorTemplateFile(name.String(), p.tpl, f)
    • 這邊的 OutputPath 指的是 plugin’s output destination 也就是你下 myplugin_out= 這邊的參數,所以這邊你可以對他進行 baseName 及 extension Name 的設定

    • 而 AddGeneratorTemplateFile 是方便我們將 template 及 name 還有 proto file 參數加到 Artifact

      1
      2
      3
      4
      5
      6
      7
      8
      9
      func (m *ModuleBase) AddGeneratorTemplateFile(name string, tpl Template, data interface{}) {
      m.AddArtifact(GeneratorTemplateFile{
      Name: name,
      TemplateArtifact: TemplateArtifact{
      Template: tpl,
      Data: data,
      },
      })
      }

      所為 Artifact 的概念是,每個 Module 的 output 結果稱為 Artifact

      1
      2
      3
      4
      5
      // An Artifact describes the output for a Module. Typically this is the creation
      // of a file either directly against the file system or via protoc.
      type Artifact interface {
      artifact()
      }

      透過執行 Module 每一個 execute 結果來拿到每一個 Module output 的 artifact,看 code 會更明白:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      func (wf *standardWorkflow) Run(ast AST) (arts []Artifact) {
      ctx := Context(wf.Debugger, wf.params, wf.params.OutputPath())

      wf.Debug("initializing modules")
      for _, m := range wf.mods {
      m.InitContext(ctx.Push(m.Name()))
      }

      wf.Debug("executing modules")
      for _, m := range wf.mods {
      arts = append(arts, m.Execute(ast.Targets(), ast.Packages())...)
      }

      return
      }

      接著將這些 artifacts 轉成 CodeGeneratorResponse:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      func (wf *standardWorkflow) Persist(arts []Artifact) {
      resp := wf.persister.Persist(arts...)

      data, err := proto.Marshal(resp)
      wf.CheckErr(err, "marshaling output proto")

      n, err := wf.out.Write(data)
      wf.CheckErr(err, "writing output proto")
      wf.Assert(len(data) == n, "failed to write all output")

      wf.Debug("rendering successful")
      }

      可以看裡面的 Persist 的實作會更清楚:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      func (p *stdPersister) Persist(arts ...Artifact) *plugin_go.CodeGeneratorResponse {
      resp := new(plugin_go.CodeGeneratorResponse)
      resp.SupportedFeatures = p.supportedFeatures

      for _, a := range arts {
      switch a := a.(type) {
      case GeneratorFile:
      ...
      case GeneratorTemplateFile:
      f, err := a.ProtoFile()
      p.CheckErr(err, "unable to convert ", a.Name, " to proto")
      f.Content = proto.String(p.postProcess(a, f.GetContent()))
      p.insertFile(resp, f, a.Overwrite)
      case GeneratorAppend:
      ...
      case GeneratorTemplateAppend:
      ...
      case GeneratorInjection:
      ...
      case CustomFile:
      ...
      case CustomTemplateFile:
      ...
      case GeneratorError:
      ...
      default:
      p.Failf("unrecognized artifact type: %T", a)
      }
      }

      return resp
      }

      根據你當初是怎樣 generate 你的 file 去 switch case 比對並將這些 file 內容 insert 到 resp 裡面。

      至此你的整個 resp 的內容就填好了,在透過 marshal 的方式將這些 file 寫到 disk 上:

      1
      2
      3
      4
      5
      6
      7
      8
      data, err := proto.Marshal(resp)
      wf.CheckErr(err, "marshaling output proto")

      n, err := wf.out.Write(data)
      wf.CheckErr(err, "writing output proto")
      wf.Assert(len(data) == n, "failed to write all output")

      wf.Debug("rendering successful")

      真是有趣啊!

    最後看看你的 main func 要怎麼寫吧:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    func main() {
    optionalFeature := uint64(pluginpb.CodeGeneratorResponse_FEATURE_PROTO3_OPTIONAL)
    pgs.Init(
    pgs.DebugMode(),
    pgs.SupportedFeatures(&optionalFeature),
    ).RegisterModule(
    module.NewJSONifyModule(),
    ).RegisterPostProcessor(
    pgsgo.GoFmt(),
    ).Render()
    }
    1. optionalFeature 指的是因為現在 proto3 已經可以使用 optional 的 feature,所以透過開啟這個 feature 來讓你的 proto3 如果有使用 optional keyword 能夠正常的 parse,可以參考這份文件

    2. Init 這邊選擇是否要開啟 DebugMode 就可以看到 debug log 幫助我們 trace

    3. RegisterModule 這邊可以註冊多個 Module

    4. PostProcessor 前面沒提到,這個功能是有時候我們希望 artifacts 在寫入 disk 前可以做一些其他事情,例如 lib 有提供 go fmt 的功能,來讓我們產生的 go 檔案的格式有符合規範,挺方便的

    5. 最後使用 Render 來進行我們前面所說整個 protoc flow!所以使用上就是先 build 你自己的 protoc-gen-demo 的 executable file 接著執行下面的 protoc flow:

      1
      2
      3
      4
      5
      6
      go build -o ./protoc-gen-demo/protoc-gen-demo.bin ./protoc-gen-demo
      protoc \
      --plugin=protoc-gen-demo=./protoc-gen-demo/protoc-gen-demo.bin \
      --go_out=paths=source_relative:. \
      --demo_out=foo=bar,paths=source_relative:. \
      proto/pb.proto

      哦這邊多提一下 paths 本身是特別的參數,來看看:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      func (c context) OutputPath(e pgs.Entity) pgs.FilePath {
      out := e.File().InputPath().SetExt(".pb.go")

      // source relative doesn't try to be fancy
      if Paths(c.p) == SourceRelative {
      return out
      }

      path, _ := c.optionPackage(e)

      // Import relative ignores the existing file structure
      return pgs.FilePath(path).Push(out.Base())
      }
      1. paths 參數是一種特別參數有:importsource_relative

      2. default 值為 import,代表 output path 會是按照當前你的 golang code 的 pkg 整個路徑去創建,例如會直接建立成 github.com/KennyChenFight/protogen-demo/proto 在你的專案上,如果用 source_relative,代表按照 proto 的路徑直接產生檔案

      3. 這邊要特別注意一點:

        proto file 裡面 option go_packagepaths=source_relative:. 命令行兩個參數是要互相搭配的 。例如跟 protoc-gen-go 的搭配如下:

        • option go_package 是為了讓生成的 go code 可以有正確 package name 可以 import
        • --go_out=paths=source_relative:. 表示按照 proto source file 的目錄,來產生 golang code

        如果你沒有定義 go_package 而用 paths=source_relative 則產生的 golang code 會拿 proto file package 的值來用,不建議用這個 package 的值,因為這個值比較像是這個 proto file 的 domain 的意思,而 google 官方也建議未來會強制設定每個 language 的 package 來方便使用:

        1
        2
        3
        4
        2022/07/10 11:53:20 WARNING: Missing 'go_package' option in "proto/pb.proto",
        please specify it with the full Go package path as
        a future release of protoc-gen-go will require this be specified.
        See https://developers.google.com/protocol-buffers/docs/reference/go-generated#package for more information.

    至此,開發 plugin 用 protoc-gen-star 的講解就到此為止。

protogen

看了上面的 lib 如果覺得很複雜,那麼你還有一個選擇,那就是使用官方出的 protoreflect lib 來快速開發我們的 plugin!

在 2020 的時候有發行 v2 版本的 protobuf API,其中就提供了 protogen 的 API 方便我們做 plugin 開發:

直接看 code 會更理解:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// A Plugin is a protoc plugin invocation.
type Plugin struct {
// Request is the CodeGeneratorRequest provided by protoc.
Request *pluginpb.CodeGeneratorRequest

// Files is the set of files to generate and everything they import.
// Files appear in topological order, so each file appears before any
// file that imports it.
Files []*File
FilesByPath map[string]*File

// SupportedFeatures is the set of protobuf language features supported by
// this generator plugin. See the documentation for
// google.protobuf.CodeGeneratorResponse.supported_features for details.
SupportedFeatures uint64

fileReg *protoregistry.Files
enumsByName map[protoreflect.FullName]*Enum
messagesByName map[protoreflect.FullName]*Message
annotateCode bool
pathType pathType
module string
genFiles []*GeneratedFile
opts Options
err error
}

首先先來了解 Plugin Struct 的結構,這裡面會有 CodeGeneratorRequest 的欄位以及你所需要知道 protofile 的資訊,就是 Files 欄位,仔細看 File 結構:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// A File describes a .proto source file.
type File struct {
Desc protoreflect.FileDescriptor
Proto *descriptorpb.FileDescriptorProto

GoDescriptorIdent GoIdent // name of Go variable for the file descriptor
GoPackageName GoPackageName // name of this file's Go package
GoImportPath GoImportPath // import path of this file's Go package

Enums []*Enum // top-level enum declarations
Messages []*Message // top-level message declarations
Extensions []*Extension // top-level extension declarations
Services []*Service // top-level service declarations

Generate bool // true if we should generate code for this file

// GeneratedFilenamePrefix is used to construct filenames for generated
// files associated with this source file.
//
// For example, the source file "dir/foo.proto" might have a filename prefix
// of "dir/foo". Appending ".pb.go" produces an output file of "dir/foo.pb.go".
GeneratedFilenamePrefix string

location Location
}

它已經將複雜的 descriptor parse 的部分 parse 出來變成 Message, service, enum 等結構了,方便我們去做使用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// A Message describes a message.
type Message struct {
Desc protoreflect.MessageDescriptor

GoIdent GoIdent // name of the generated Go type

Fields []*Field // message field declarations
Oneofs []*Oneof // message oneof declarations

Enums []*Enum // nested enum declarations
Messages []*Message // nested message declarations
Extensions []*Extension // nested extension declarations

Location Location // location of this message
Comments CommentSet // comments associated with this message
}

那麼如果是 protoc 給進來的 command line 參數呢?在 Option struct 裡面:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
type Options struct {
// If ParamFunc is non-nil, it will be called with each unknown
// generator parameter.
//
// Plugins for protoc can accept parameters from the command line,
// passed in the --<lang>_out protoc, separated from the output
// directory with a colon; e.g.,
//
// --go_out=<param1>=<value1>,<param2>=<value2>:<output_directory>
//
// Parameters passed in this fashion as a comma-separated list of
// key=value pairs will be passed to the ParamFunc.
//
// The (flag.FlagSet).Set method matches this function signature,
// so parameters can be converted into flags as in the following:
//
// var flags flag.FlagSet
// value := flags.Bool("param", false, "")
// opts := &protogen.Options{
// ParamFunc: flags.Set,
// }
// protogen.Run(opts, func(p *protogen.Plugin) error {
// if *value { ... }
// })
ParamFunc func(name, value string) error

// ImportRewriteFunc is called with the import path of each package
// imported by a generated file. It returns the import path to use
// for this package.
ImportRewriteFunc func(GoImportPath) GoImportPath
}

再來,我的 plugin 客製化的部分是在 Option 提供 Run method 裡面進行

1
2
3
4
5
6
7
8
9
10
11
12
13
// Run executes a function as a protoc plugin.
//
// It reads a CodeGeneratorRequest message from os.Stdin, invokes the plugin
// function, and writes a CodeGeneratorResponse message to os.Stdout.
//
// If a failure occurs while reading or writing, Run prints an error to
// os.Stderr and calls os.Exit(1).
func (opts Options) Run(f func(*Plugin) error) {
if err := run(opts, f); err != nil {
fmt.Fprintf(os.Stderr, "%s: %v\n", filepath.Base(os.Args[0]), err)
os.Exit(1)
}
}

會需要丟進去 wrapper 好的 function,而這個 function 就是我們 plugin 進行客製化的部分!

再來細看 run 的實作會更明白:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
func run(opts Options, f func(*Plugin) error) error {
if len(os.Args) > 1 {
return fmt.Errorf("unknown argument %q (this program should be run by protoc, not directly)", os.Args[1])
}
in, err := ioutil.ReadAll(os.Stdin)
if err != nil {
return err
}
req := &pluginpb.CodeGeneratorRequest{}
if err := proto.Unmarshal(in, req); err != nil {
return err
}
gen, err := opts.New(req)
if err != nil {
return err
}
if err := f(gen); err != nil {
// Errors from the plugin function are reported by setting the
// error field in the CodeGeneratorResponse.
//
// In contrast, errors that indicate a problem in protoc
// itself (unparsable input, I/O errors, etc.) are reported
// to stderr.
gen.Error(err)
}
resp := gen.Response()
out, err := proto.Marshal(resp)
if err != nil {
return err
}
if _, err := os.Stdout.Write(out); err != nil {
return err
}
return nil
}

哇!看 code 之後又更明白了,統整一下整個流程是這樣跑的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
func main() {
var flags flag.FlagSet
fooValue := flags.String("foo", "default", "")

protogen.Options{ParamFunc: flags.Set}.Run(func(gen *protogen.Plugin) error {
fmt.Fprintln(os.Stderr, "fooValue:", *fooValue)
for _, f := range gen.Files {
if !f.Generate {
continue
}
fmt.Fprintf(os.Stderr, "Fprintln: %v\n", "start generate file")
generateFile(gen, f)
fmt.Fprintf(os.Stderr, "Fprintln: %v\n", "end generate file")
}
return nil
})
}
  1. 定好 options 想接收的 flag 參數
  2. 接著定好你的 generate 的邏輯
  3. 丟給 run method 開始跑!
  4. 於是你的整個 protoc flow 就完成了
  5. 這邊可能會有點疑惑,為何這邊 debug 要透過 os.Stderr,是因為整個 protoc flow 是透過 stdin 與 stdout 來用的,你不能使用 stdout 這樣會打亂 protoc 與 plugin 的傳輸

那麼有什麼好方式可以做 debug,但是我又不需要使用 stderr 還要重複編譯新的再做 plugin 的測試,想想整個 protoc flow 我最需要的其實是 codeGeneratorRequest,才能進行下一步的 plugin 測試,所以如果我能夠將 codeGeneratorRequest 的資料先 dump 出來,那麼我就可以對這個 codeGeneratorRequest 進行 plugin 的測試而不用每次都要跑 protoc 去產生 codeGeneratorRequest,所以可以這麼做:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
func main() {
data, err := ioutil.ReadAll(os.Stdin)
if err != nil {
log.Fatal("unable to read input: ", err)
}

req := &pluginpb.CodeGeneratorRequest{}
if err = proto.Unmarshal(data, req); err != nil {
log.Fatal("unable to unmarshal request: ", err)
}

path := req.GetParameter()
if path == "" {
log.Fatal(`please execute the plugin with the output path to properly write the output file: --debug_out="{PATH}:{PATH}"`)
}

err = os.MkdirAll(path, 0755)
if err != nil {
log.Fatal("unable to create output dir: ", err)
}

err = ioutil.WriteFile(filepath.Join(path, "code_generator_request.pb.bin"), data, 0644)
if err != nil {
log.Fatal("unable to write request to disk: ", err)
}
}

這個是參考改出來的,這個實作其實也很簡單就是先讀取 stdin 並 Unmarshal 成 CodeGeneratorRequest 再 dump 出來就可以了。

使用方式如下:

1
2
3
4
5
protoc \                                                                                 
--plugin=protoc-gen-debug=./protoc-gen-debug/protoc-gen-debug.bin \
--go_out=paths=source_relative:. \
--debug_out=.:. \
proto/pb.proto

接著:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func main() {
data, err := ioutil.ReadFile("./code_generator_request.pb.bin")
if err != nil {
log.Fatal("unable to read input: ", err)
}

req := &pluginpb.CodeGeneratorRequest{}
if err = proto.Unmarshal(data, req); err != nil {
log.Fatal("unable to unmarshal request: ", err)
}

options := protogen.Options{}
plugin, err := options.New(req)
if err := Test(plugin); err != nil {
plugin.Error(err)
}
resp := plugin.Response()
out, err := proto.Marshal(resp)
if err != nil {
log.Fatal("unable to unmarshal response: ", err)
}
if _, err := os.Stdout.Write(out); err != nil {
log.Fatal("unable to write response: ", err)
}
}

透過 Unmarshal 成 CodeGeneratorRequest 在 new 我們 plugin 出來接著就可以對我們 Test function 進行測試了,這個 Test function 就是當初你會丟進去 Run 的函式。這樣你就不需要重複跑 protoc 去做測試了。

當然如果你使用 protoc-gen-star 的話更方便,他都幫你包好了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
func TestModule(t *testing.T) {
req, err := os.Open("./code_generator_request.pb.bin")
if err != nil {
t.Fatal(err)
}

fs := afero.NewMemMapFs()
res := &bytes.Buffer{}

pgs.Init(
pgs.ProtocInput(req), // use the pre-generated request
pgs.ProtocOutput(res), // capture CodeGeneratorResponse
pgs.FileSystem(fs), // capture any custom files written directly to disk
).RegisterModule(&MyModule{}).Render()

// check res and the fs for output
}

總結

終於整理完開發 protobuf codegen plugin 的方法,目前的想法是如果只是簡單開發可以用 protogen 官方提供的就夠用了,但以方便性及功能性來看的話,protoc-gen-star 更適合開發。這邊附上我這篇所有的範例:https://github.com/KennyChenFight/protogen-demo