nifi小试牛刀至第一次
应领导要求需要使用这个nifi工具
添加和配置第一个处理器:Getfile
添加处理器
修改处理器的name
- 打开处理器的配置方法
更改处理器属性
所有处理器的属性官方网站:http://nifi.apache.org/docs.html
Name |
Default Value |
Allowable Values |
Description |
Input Directory(输入目录) |
The input directory from which to pull files(要从中提取文件的输入目录)Supports Expression Language: true (will be evaluated using variable registry only) |
||
File Filter(文件过滤器) |
[^.].* |
Only files whose names match the given regular expression will be picked up |
|
Path Filter路径过滤器 |
When Recurse Subdirectories is true, then only subdirectories whose path matches the given regular expression will be scanned |
||
Batch Size(批量大小) |
10 |
The maximum number of files to pull in each iteration |
|
Keep Source File(保留源文件) |
false |
true/false |
If true, the file is not deleted after it has been copied to the Content Repository; this causes the file to be picked up continually and is useful for testing purposes. If not keeping original NiFi will need write permissions on the directory it is pulling from otherwise it will ignore the file. |
Recurse Subdirectories |
true |
true/false |
Indicates whether or not to pull files from subdirectories |
Polling Interval(轮询间隔) |
0 sec |
Indicates how long to wait before performing a directory listing |
|
Ignore Hidden Files(忽略隐藏文件) |
true |
true/false |
Indicates whether or not hidden files should be ignored |
Minimum File Age(最小档案年龄) |
0 sec |
The minimum age that a file must be in order to be pulled; any file younger than this amount of time (according to last modification date) will be ignored |
|
Maximum File Age(最长文件年龄) |
The maximum age that a file must be in order to be pulled; any file older than this amount of time (according to last modification date) will be ignored |
||
Minimum File Size(最小档案大小) |
0 B |
The minimum size that a file must be in order to be pulled |
|
Maximum File Size(最大档案大小) |
The maximum size that a file can be in order to be pulled |
解决错误信息
回到工作区,将鼠标放置在处理器上面的“感叹号上”。
可以看到有两个错误信息
第一条,大概的意思就是,没有这个目录在服务器上面
第二条,大概的意思就是没有上下游的意思
解决步骤
[root@localhost ~]# mkdir -p /export/tmp/source #递归创建这个目录
[root@localhost ~]# ll /export/tmp/source/ #检查这个目录是否创建成功
总用量 0
刷新工作区
第二个错误是因为没有连接上下游的原因,这里我们暂时不解决,应为下面我们就要给这个处理器进行连接上下游
创建第二个处理器:putfile
创建方式与getfile一样
更改处理器名称
更改处理器属性
Name |
Default Value |
Allowable Values |
Description |
Directory(目录) |
The directory to which files should be written. You may use expression language such as /aa/bb/${path} ;Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)(支持表达语言:true,(将使用流文件属性和变量注册表进行评估)) |
||
Conflict Resolution Strategy(解决冲突策略) |
fail |
replace(替代)/ignore(忽略)/fail(失败) |
Indicates what should happen when a file with the same name already exists in the output directory(当输出目录已经存在同名文件的时候应该怎么办) |
Create Missing Directories(创建缺失目录) |
true |
true/false |
If true, then missing destination directories will be created. If false, flowfiles are penalized and sent to failure(如果为true,则将创建缺少的目标目录,如果为false,则流文件将受到触发并发送失败). |
Maximum File Count(最大文件数) |
Specifies the maximum number of files that can exist in the output directory(指定输出目录中可以存在的最大文件数) |
||
Last Modified Time(上次修改时间) |
Sets the lastModifiedTime on the output file to the value of this attribute. Format must be yyyy-MM-dd’T’HH:mm:ssZ. You may also use expression language such as ${file.lastModifiedTime}.;Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)(支持表达式语言:true,(j将使用的流文件属性和变量注册表进行评估)) |
||
Permissions(当前登录nifi用户的权限) |
Sets the permissions on the output file to the value of this attribute. Format must be either UNIX rwxrwxrwx with a - in place of denied permissions (e.g. rw-r–r--) or an octal number (e.g. 644). You may also use expression language such as ${file.permissions};.Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
Owner(当前登录nifi用户的所有者) |
Sets the owner on the output file to the value of this attribute. You may also use expression language such as ${file.owner}. Note on many operating systems Nifi must be running as a super-user to have the permissions to set the file owner.;Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
||
Group(当前登录用户的组) |
Sets the group on the output file to the value of this attribute. You may also use expression language such as ${file.group}.;Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
创建写入文件目录
[root@localhost ~]# mkdir -p /export/tmp/target
[root@localhost ~]# ll /export/tmp/target
总用量 0
设置处理器的属性(putfile)
小编这里写错目录了哈,大家注意一下,应该写成/export/tmp/target/,小编少写了一个"p"
连接两个处理器
- 鼠标停留在处理器上面会出现一个向右下的箭头图标
- 鼠标左键按住这个图标,向右拖动到要连接的处理器上面
- 连接成功后
- 效果
连接测试
启动getfile
创建一个需要传输的文件
[root@localhost ~]# cd /export/tmp/source/ #到source目录下面
[root@localhost source]# ls #可以看到这个目录下面是没有文件的
[root@localhost source]#
[root@localhost source]# echo "hello word " > hello-word.txt ##创建文件
[root@localhost source]# ls #经过查看,发现没有这个文件,这是因为我们的处理器已经把我们这个文件读取出来了,把源文件给删除掉了
[root@localhost source]#
###再写入几个文件进行测试
[root@localhost source]# echo "hello word1 " > hello-word.txt
[root@localhost source]# ls #还是没有创建的文件,原因同上
如果工作区没有变化,请刷新工作区
[root@localhost source]# echo "hello word1 " > hello-word2.txt
[root@localhost source]# ls
启动putfile处理器
当我们在putfile处理器上面右键,可以发现没有启动按钮,这个是因为putfile处理器上面存在报错内容,暂时不支持启动,如下图
错误如下:
解决putfile处理器上面的错误
错误一
错误二
原因如下:勾选上哲两个即可
解决方法
如果要更改处理器的配置的饿话,必须是处理器在"stop"的状态下
启动putfile
报错如下:
原因是,目标目录下面已经有同名文件存在了
验证一下错误
##小编上面putfile处理器中的dest目录写错了哈,大家要改成/export/tmp/target
[root@localhost target]# ls #可以看见同名称的文件确实不存在
hello-word2.txt hello-word.txt
[root@localhost target]# pwd
/export/tmp/target
[root@localhost target]# cat hello-word2.txt
hello word1
[root@localhost target]# cat hello-word.txt
hello word
这个是因为咱的putfile处理器里面Confict ResolutionStrategy这个选项设置成了fail,这个选项说明的是如果有同名文件就失败,其中replace(替代)/ignore(忽略)/fail(失败),这里可以选为relace进行替代同名文件,可以自行测试
结束语
坚持你会看见更好的自己