-
로그 데이터 수집하기: Prologue. 저장소/파이프라인 후보 검토 (Collecting log data: Prologue. Storage lists + Pipeline)Project/Collecting Event Data 2022. 11. 7. 21:22728x90반응형
Entire pipeline (전체 파이프라인)
Detailed pipeline (DE파트 파이프라인)
(receive log -> s3 tier1)discussing part -> convert data to parquet file and save to tier2 (s3, glue) -> ETL to DW (redshift, glue) -> reverse ETL to serviceDB (mysql, glue)
Points (작업하면서 고려해야할 포인트)
- revserse ETL batch schedule 배치 스케줄
- storage read/write speed 저장소 읽기/쓰기 속도
- batch speed 배치 속도
How log incomes (로그가 어떻게 수집되는지)
Need to check before considering below:
Are EBS snapshots stored at S3 automatically? If yes, where? EBS snapshot S3에 자동으로찍히는지? 찍힌다면 어디에?
Option1.
If the name of log file is fixed: Cloudwatch -> s3
- Need to check cost
- Cons: One file per an event, Can't do transforming
Option2.
Kinesis -> s3 (If need transforming - Kinesis firehorse, else kinesis stream)
- Cons: Cost?
Option3.
Gateway api -> Lambda -> s3
- Cons: No messageque, costs everytime event calls lambda
Option4.
Fluentd -> s3
- Cons: Can't do transforming
ETC
- Use crontab + AWS CLI to send log file to s3 directly & everyday
- send ELB access log to s3 directly (docs)
728x90반응형'Project > Collecting Event Data' 카테고리의 다른 글