로그 데이터 수집하기: Prologue. 저장소/파이프라인 후보 검토 (Collecting log data: Prologue. Storage lists + Pipeline)

Hyunie 2022. 11. 7. 21:22

728x90

Entire pipeline (전체 파이프라인)

Detailed pipeline (DE파트 파이프라인)

(receive log -> s3 tier1)discussing part -> convert data to parquet file and save to tier2 (s3, glue) -> ETL to DW (redshift, glue) -> reverse ETL to serviceDB (mysql, glue)

Points (작업하면서 고려해야할 포인트)

- revserse ETL batch schedule 배치 스케줄

- storage read/write speed 저장소 읽기/쓰기 속도

- batch speed 배치 속도

How log incomes (로그가 어떻게 수집되는지)

Need to check before considering below:

Are EBS snapshots stored at S3 automatically? If yes, where? EBS snapshot S3에 자동으로찍히는지? 찍힌다면 어디에?

Option1.

If the name of log file is fixed: Cloudwatch -> s3

- Need to check cost

- Cons: One file per an event, Can't do transforming

Option2.

Kinesis -> s3 (If need transforming - Kinesis firehorse, else kinesis stream)

- Cons: Cost?

Option3.

Gateway api -> Lambda -> s3

- Cons: No messageque, costs everytime event calls lambda

Option4.

Fluentd -> s3

- Cons: Can't do transforming

ETC

- Use crontab + AWS CLI to send log file to s3 directly & everyday

- send ELB access log to s3 directly (docs)

728x90

저작자표시 (새창열림)