因果机器学习调研综述(引用下)

228 阅读22分钟

目录与链接

因果机器学习调研综述(上)

因果机器学习调研综述(中)

因果机器学习调研综述(下)

因果机器学习调研综述(引用上)

因果机器学习调研综述(引用下)

[301] E. Abbasnejad, D. Teney, A. Parvaneh, J. Shi, and A. van den Hengel, "Counterfactual Vision and Language Learning," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA: IEEE, Jun. 2020, pp. 10 041–10 051. [Online]. Available: ieeexplore.ieee.org/document/91…

[302] M. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W. Chang, "Generating natural language adversarial examples," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, Oct.-Nov. 2018, pp. 2890–2896. [Online]. Available: aclanthology.org/D18-1316

[303] M. Iyyer, J. Wieting, K. Gimpel, and L. Zettlemoyer, "Adversarial example generation with syntactically controlled paraphrase networks," 2018. [Online]. Available: arxiv.org/abs/1804.06…

[304] J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, "Hotflip: White-box adversarial examples for text classification," 2017. [Online]. Available: arxiv.org/abs/1712.06…

[305] M. Ye, C. Gong, and Q. Liu, "Safer: A structure-free approach for certified robustness to adversarial word substitutions," 2020. [Online]. Available: arxiv.org/abs/2005.14…

[306] M. Mozes, M. Bartolo, P. Stenetorp, B. Kleinberg, and L. D. Griffin, "Contrasting human- and machine-generated word-level adversarial examplesfor text classification," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, M. Moens, X. Huang, L. Specia, and S. W. Yih, Eds. Association for Computational Linguistics, 2021, pp. 8258–8270. [Online]. Available: doi.org/10.18653/v1…

[307] H. Zhao, C. Ma, X. Dong, A. T. Luu, Z.-H. Deng, and H. Zhang, "Certified robustness against natural language attacks by causal intervention," arXiv preprint arXiv:2205.12331, 2022.

[308] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning word vectors for sentiment analysis," in Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, 2011, pp. 142–150.

[309] A. Feder, N. Oved, U. Shalit, and R. Reichart, "Causalm: Causal model explanation through counterfactual language models," Comput. Linguistics, vol. 47, no. 2, pp. 333–386, 2021. [Online]. Available: doi.org/10.1162/col…

[310] M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Kenthapadi, and A. T. Kalai, "Bias in bios: A case study of semantic representation bias in a high-stakes setting," in Proceedings of the Conference on Fairness, Accountability, and Transparency, ser. FAT* '19. New York, NY, USA: Association for Computing Machinery, 2019, p. 120–128. [Online]. Available: doi.org/10.1145/328…

[311] J. Vig, S. Gehrmann, Y. Belinkov, S. Qian, D. Nevo, Y. Singer, and S. M. Shieber, "Investigating gender bias in language models using causal mediation analysis," in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., 2020. [Online]. Available: proceedings.neurips.cc/paper/2020/…

[312] S. Garg, V. Perot, N. Limtiaco, A. Taly, E. H. Chi, and A. Beutel, "Counterfactual fairness in text classification through robustness," inProceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA, January 27-28, 2019, V. Conitzer, G. K. Hadfield, and S. Vallor, Eds. ACM, 2019, pp. 219–226. [Online]. Available: doi.org/10.1145/330…

[313] X. Zeng, Y. Li, Y. Zhai, and Y. Zhang, "Counterfactual generator: A weaklysupervised method for named entity recognition," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, B. Webber, T. Cohn, Y. He, and Y. Liu,Eds. Association for Computational Linguistics, 2020, pp. 7270–7280. [Online]. Available: doi.org/10.18653/v1…

[314] T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai, "Man is to computer programmer as woman is to homemaker? debiasing word embeddings," in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc., 2016. [Online]. Available: proceedings.neurips.cc/paper/2016/…

[315] J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang, "Men also like shopping: Reducing gender bias amplification using corpus-level constraints," in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, Sep. 2017, pp. 2979–2989. [Online]. Available: aclanthology.org/D17-1323

[316] R. Zmigrod, S. J. Mielke, H. Wallach, and R. Cotterell, "Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 1651–1661. [Online]. Available: aclanthology.org/P19-1161

[317] J. Nivre, M. Abrams, Ž. Agić, L. Ahrenberg, L. Antonsen, K. Aplonova, M. J. Aranzabe, G. Arutie, M. Asahara, L. Ateyah, M. Attia, A. Atutxa, L. Augustinus, E. Badmaeva, M. Ballesteros, E. Banerjee, S. Bank, V. Barbu Mititelu, V. Basmov, J. Bauer, S. Bellato, K. Bengoetxea, Y. Berzak, I. A. Bhat, R. A. Bhat, E. Biagetti, E. Bick, R. Blokland, V. Bobicev, C. Börstell, C. Bosco, G. Bouma, S. Bowman, A. Boyd, A. Burchardt, M. Candito, B. Caron, G. Caron, G. Cebiroğlu Eryiğit, F. M. Cecchini, G. G. A. Celano, S. Čéplö, S. Cetin, F. Chalub, J. Choi, Y. Cho, J. Chun, S. Cinková, A. Collomb, Ç. Çöltekin, M. Connor, M. Courtin, E. Davidson, M.-C. de Marneffe, V. de Paiva, A. Diaz de Ilarraza, C. Dickerson, P. Dirix, K. Dobrovoljc, T. Dozat, K. Droganova, P. Dwivedi, M. Eli, A. Elkahky, B. Ephrem, T. Erjavec, A. Etienne, R. Farkas, H. Fernandez Alcalde, J. Foster, C. Freitas, K. Gajdošová, D. Galbraith, M. Garcia, M. Gärdenfors, S. Garza, K. Gerdes, F. Ginter, I. Goenaga, K. Gojenola, M. Gökırmak, Y. Goldberg, X. Gómez Guinovart, B. Gonzáles Saavedra, M. Grioni, N. Gr ̄uz ̄ıtis, B. Guillaume, C. GuillotBarbance, N. Habash, J. Hajič, J. Hajič jr., L. Hà M ̃y, N.-R. Han, K. Harris, D. Haug, B. Hladká, J. Hlaváčová, F. Hociung, P. Hohle, J. Hwang, R. Ion, E. Irimia, O. . Ishola, T. Jelínek, A. Johannsen, F. Jørgensen, H. Kaşıkara, S. Kahane, H. Kanayama, J. Kanerva, B. Katz, T. Kayadelen, J. Kenney, V. Kettnerová, J. Kirchner, K. Kopacewicz, N. Kotsyba, S. Krek, S. Kwak, V. Laippala, L. Lambertino, L. Lam, T. Lando, S. D. Larasati, A. Lavrentiev, J. Lee, P. Lê H`ông, A. Lenci, S. Lertpradit,H. Leung, C. Y. Li, J. Li, K. Li, K. Lim, N. Ljubešić, O. Loginova, O. Lyashevskaya, T. Lynn, V. Macketanz, A. Makazhanov, M. Mandl, C. Manning, R. Manurung, C. Mărănduc, D. Mareček, K. Marheinecke, H. Martínez Alonso, A. Martins, J. Mašek, Y. Matsumoto, R. McDonald, G. Mendonça, N. Miekka, M. Misirpashayeva, A. Missilä, C. Mititelu, Y. Miyao, S. Montemagni, A. More, L. Moreno Romero, K. S. Mori, S. Mori, B. Mortensen, B. Moskalevskyi, K. Muischnek, Y. Murawaki, K. Müürisep, P. Nainwani, J. I. Navarro Horñiacek, A. Nedoluzhko, G. Nešpore-B ̄erzkalne, L. Nguy ̃ên Thi., H. Nguy ̃ên Thi. Minh, V. Nikolaev, R. Nitisaroj, H. Nurmi, S. Ojala, A. Olúòkun, M. Omura, P. Osenova, R. Östling, L. Øvrelid, N. Partanen, E. Pascual, M. Passarotti, A. Patejuk, G. Paulino-Passos, S. Peng, C.-A. Perez, G. Perrier, S. Petrov, J. Piitulainen, E. Pitler, B. Plank, T. Poibeau, M. Popel, L. Pretkalnin, a, S. Prévost, P. Prokopidis, A. Przepiórkowski, T. Puolakainen, S. Pyysalo, A. Rääbis, A. Rademaker, L. Ramasamy, T. Rama, C. Ramisch, V. Ravishankar, L. Real, S. Reddy, G. Rehm, M. Rießler, L. Rinaldi, L. Rituma, L. Rocha, M. Romanenko, R. Rosa, D. Rovati, V. Ros, ca, O. Rudina, J. Rueter, S. Sadde, B. Sagot, S. Saleh, T. Samardžić, S. Samson, M. Sanguinetti, B. Saul ̄ıte, Y. Sawanakunanon, N. Schneider, S. Schuster, D. Seddah, W. Seeker, M. Seraji, M. Shen, A. Shimada, M. Shohibussirri, D. Sichinava, N. Silveira, M. Simi, R. Simionescu, K. Simkó, M. Šimková, K. Simov, A. Smith, I. Soares-Bastos, C. Spadine, A. Stella, M. Straka, J. Strnadová, A. Suhr, U. Sulubacak, Z. Szántó, D. Taji, Y. Takahashi, T. Tanaka, I. Tellier, T. Trosterud, A. Trukhina, R. Tsarfaty, F. Tyers, S. Uematsu, Z. Urešová, L. Uria, H. Uszkoreit, S. Vajjala, D. van Niekerk, G. van Noord, V. Varga, E. Villemonte de la Clergerie, V. Vincze, L. Wallin, J. X. Wang, J. N. Washington, S. Williams, M. Wirén, T. Woldemariam, T.-s. Wong, C. Yan, M. M. Yavrumyan, Z. Yu, Z. Žabokrtský, A. Zeldes, D. Zeman, M. Zhang, and H. Zhu, "Universal dependencies 2.3," 2018, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [Online]. Available: hdl.handle.net/11234/1-289…

[318] B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. M. Mooij, "On causal and anticausal learning," in Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress, 2012. [Online]. Available: icml.cc/2012/papers…

[319] N. Kilbertus*, G. Parascandolo*, and B. Schölkopf*, "Generalization in anticausal learning," in NeurIPS 2018 Workshop on Critiquing and Correcting Trends in Machine Learning, Dec. 2018, *authors are listed in alphabetical order. [Online]. Available: ml-critique-correct.github.io/

[320] Z. Jin, J. von Kügelgen, J. Ni, T. Vaidhya, A. Kaushal, M. Sachan, and B. Schölkopf, "Causal direction of data collection matters: Implications ofcausal and anticausal learning for nlp," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Nov. 2021, pp. 9499–9513, *equal contribution. [Online]. Available: aclanthology.org/2021.emnlp-…

[321] W. Fan, Y. Ma, Q. Li, Y. He, E. Zhao, J. Tang, and D. Yin, "Graph neural networks for social recommendation," in The world wide web conference, 2019, pp. 417–426.

[322] X. Jing and J. Xu, "Fast and effective protein model refinement using deep graph neural networks," Nature computational science, vol. 1, no. 7, pp. 462– 469, 2021.

[323] W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, "Open graph benchmark: Datasets for machine learning on graphs," Advances in neural information processing systems, vol. 33, pp. 22 118–22 133, 2020.

[324] Y. Wu, X. Wang, A. Zhang, X. He, and T.-S. Chua, "Discovering invariant rationales for graph neural networks," in International Conference on Learning Representations, 2022. [Online]. Available: openreview.net/forum?id=hG…

[325] Y. Chen, Y. Zhang, H. Yang, K. Ma, B. Xie, T. Liu, B. Han, and J. Cheng, "Invariance principle meets out-of-distribution generalization on graphs," arXiv preprint arXiv:2202.05441, 2022.

[326] Y. Sui, X. Wang, J. Wu, X. He, and T.-S. Chua, "Deconfounded training for graph neural networks," ArXiv, vol. abs/2112.15089, 2021.

[327] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, "Collective classification in network data," AI magazine, vol. 29, no. 3, pp. 93–93, 2008.

[328] T. Zhao, G. Liu, D. Wang, W. Yu, and M. Jiang, "Learning from counterfactual links for link prediction," 2021. [Online]. Available: arxiv.org/abs/2106.02…

[329] D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun, "Measuring and relieving the over-smoothing problem for graph neural networks from the topological view," in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 2020, pp. 3438–3445. [Online]. Available: ojs.aaai.org/index.php/A…

[330] F. Feng, W. Huang, X. He, X. Xin, Q. Wang, and T.-S. Chua, "Should graph convolution trust neighbors? a simple causal inference method," in Proceedingsof the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 1208–1218.

[331] M. Zečević, D. S. Dhami, P. Veličković, and K. Kersting, "Relating graph neural networks to structural causal models," 2021.

[332] E. Todorov, T. Erez, and Y. Tassa, "Mujoco: A physics engine for model-based control," in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033.

[333] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, "The arcade learning environment: An evaluation platform for general agents," Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, jun 2013.

[334] N. R. Ke, A. Didolkar, S. Mittal, A. Goyal, G. Lajoie, S. Bauer, D. Rezende, Y. Bengio, M. Mozer, and C. Pal, "Systematic evaluation of causal discovery in visual model based reinforcement learning," 2021. [Online]. Available: arxiv.org/abs/2107.00…

[335] J. X. Wang, M. King, N. P. M. Porcel, Z. Kurth-Nelson, T. Zhu, C. Deck, P. Choy, M. Cassin, M. Reynolds, H. F. Song, G. Buttimore, D. P. Reichert, N. C. Rabinowitz, L. Matthey, D. Hassabis, A. Lerchner, and M. Botvinick, "Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents," in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. [Online]. Available: openreview.net/forum?id=eZ…

[336] D. McDuff, Y. Song, J. Lee, V. Vineet, S. Vemprala, N. A. Gyde, H. Salman, S. Ma, K. Sohn, and A. Kapoor, "Causalcity: Complex simulations with agency for causal discovery and reasoning," in First Conference on Causal Learning and Reasoning, 2022. [Online]. Available: openreview.net/forum?id=YW…

[337] K. Yi, C. Gan, Y. Li, P. Kohli, J. Wu, A. Torralba, and J. B. Tenenbaum, "Clevrer: Collision events for video representation and reasoning," inInternational Conference on Learning Representations, 2020. [Online]. Available: openreview.net/forum?id=Hk…

[338] V. Ramanishka, Y. Chen, T. Misu, and K. Saenko, "Toward driving scene understanding: A dataset for learning driver behavior and causal reasoning," in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 2018, pp. 7699–7707. [Online]. Available: openaccess.thecvf.com/content_cvp…

[339] T. You and B. Han, "Traffic accident benchmark for causality recognition," in Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VII, ser. Lecture Notes inComputer Science, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, Eds., vol. 12352. Springer, 2020, pp. 540–556. [Online]. Available: doi.org/10.1007/978…

[340] J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: pre-training of deep bidirectional transformers for language understanding," in NAACL-HLT (1). Association for Computational Linguistics, 2019, pp. 4171–4186.

[341] D. Kaushik, E. Hovy, and Z. Lipton, "Learning the difference that makes a difference with counterfactually-augmented data," in International Conference on Learning Representations, 2020. [Online]. Available: https: //openreview.net/forum?id=Sklgs0NFvr

[342] J. Frohberg and F. Binder, "Crass: A novel data set and benchmark to test counterfactual reasoning of large language models," 2021.

[343] A. Srivastava, A. Rastogi, A. B. Rao, A. A. M. Shoeb, A. Abid, A. Fisch, A. R. Brown, A. Santoro, A. Gupta, A. Garriga-Alonso, A. Kluska, A. Lewkowycz, A. Agarwal, A. Power, A. Ray, A. Warstadt, A. W. Kocurek, A. Safaya, A. Tazarv, A. Xiang, A. Parrish, A. Nie, A. Hussain, A. Askell, A. Dsouza, A. A. Rahane, A. S. Iyer, A. J. Andreassen, A. Santilli, A. Stuhlmuller, A. M. Dai, A. D. La, A. K. Lampinen, A. Zou, A. Jiang, A. Chen, A. Vuong, A. Gupta, A. Gottardi, A. Norelli, A. Venkatesh, A. Gholamidavoodi, A. Tabassum, A. Menezes, A. Kirubarajan, A. Mullokandov, A. Sabharwal, A. Herrick, A. Efrat, A. Erdem, A. Karakacs, B. R. Roberts, B. S. Loe, B. Zoph, B. Bojanowski, B. Ozyurt, B. Hedayatnia, B. Neyshabur, B. Inden, B. Stein, B. Ekmekci, B. Y. Lin, B. S. Howald, C. Diao, C. Dour, C. Stinson, C. Argueta, C. F. Ram'irez, C. Singh, C. Rathkopf, C. Meng, C. Baral, C. Wu, C. Callison-Burch, C. Waites, C. Voigt, C. D. Manning, C. Potts, C. T. Ramirez, C. Rivera, C. Siro, C. Raffel, C. Ashcraft, C. Garbacea, D. Sileo, D. H. Garrette, D. Hendrycks, D. Kilman, D. Roth, D. Freeman, D. Khashabi, D. Levy, D. Gonz'alez, D. Hernandez, D. Chen, D. Ippolito, D. Gilboa, D. Dohan, D. Drakard, D. Jurgens, D. Datta, D. Ganguli, D. Emelin, D. Kleyko, D. Yuret, D. Chen, D. Tam, D. Hupkes, D. Misra, D. Buzan, D. C. Mollo, D. Yang, D.-H. Lee, E. Shutova, E. D. Cubuk, E. Segal, E. Hagerman, E. Barnes, E. P. Donoway, E. Pavlick, E. Rodolà, E. F. Lam, E. Chu, E. Tang, E. Erdem, E. Chang, E. A. Chi, E. Dyer, E. Jerzak, E. Kim, E. E. Manyasi, E. Zheltonozhskii, F. Xia, F. Siar, F. Mart'inez-Plumed, F. Happ'e, F. Chollet, F. Rong, G. Mishra, G. I. Winata, G. de Melo, G. Kruszewski, G. Parascandolo, G. Mariani, G. Wang, G. Jaimovitch-L'opez, G. Betz, G. Gur-Ari, H. Galijasevic, H. S. Kim, H. Rashkin, H. Hajishirzi, H. Mehta, H. Bogar, H. Shevlin, H. Schütze, H. Yakura, H. Zhang, H. Wong, I. A.-S. Ng, I. Noble, J. Jumelet, J. Geissinger, J. Kernion, J. Hilton, J. Lee, J. F. Fisac, J. B. Simon, J. Koppel, J. Zheng, J. Zou, J. Koco'n, J. Thompson, J. Kaplan, J. Radom, J. Sohl-Dickstein, J. Phang, J. Wei, J. Yosinski, J. Novikova, J. Bosscher, J. Marsh, J. Kim, J. Taal, J. Engel, J. O. Alabi, J. Xu, J. Song, J. Tang, J. W. Waweru, J. Burden, J. Miller, J. U. Balis, J. Berant, J. Frohberg,J. Rozen, J. Hernández-Orallo, J. Boudeman, J. Jones, J. B. Tenenbaum, J. S. Rule, J. Chua, K. Kanclerz, K. Livescu, K. Krauth, K. Gopalakrishnan, K. Ignatyeva, K. Markert, K. D. Dhole, K. Gimpel, K. O. Omondi, K. W. Mathewson, K. Chiafullo, K. Shkaruta, K. Shridhar, K. McDonell, K. Richardson, L. Reynolds, L. Gao, L. Zhang, L. Dugan, L. Qin, L. Contreras-Ochando, L.-P. Morency, L. Moschella, L. Lam, L. Noble, L. Schmidt, L. He, L. O. Col'on, L. Metz, L. K. cSenel, M. Bosma, M. Sap, M. ter Hoeve, M. Andrea, M. S. Farooqi, M. Faruqui, M. Mazeika, M. Baturan, M. Marelli, M. Maru, M. Quintana, M. Tolkiehn, M. Giulianelli, M. Lewis, M. Potthast, M. Leavitt, M. Hagen, M. Schubert, M. Baitemirova, M. Arnaud, M. A. McElrath, M. A. Yee, M. Cohen, M. Gu, M. I. Ivanitskiy, M. Starritt, M. Strube, M. Swkedrowski, M. Bevilacqua, M. Yasunaga, M. Kale, M. Cain, M. Xu, M. Suzgun, M. Tiwari, M. Bansal, M. Aminnaseri, M. Geva, M. Gheini, T. MukundVarma, N. Peng, N. Chi, N. Lee, N. G.-A. Krakover, N. Cameron, N. S. Roberts, N. Doiron, N. Nangia, N. Deckers, N. Muennighoff, N. S. Keskar, N. Iyer, N. Constant, N. Fiedel, N. Wen, O. Zhang, O. Agha, O. Elbaghdadi, O. Levy, O. Evans, P. Casares, P. Doshi, P. Fung, P. P. Liang, P. Vicol, P. Alipoormolabashi, P. Liao, P. Liang, P. W. Chang, P. Eckersley, P. M. Htut, P.-B. Hwang, P. Milkowski, P. S. Patil, P. Pezeshkpour, P. Oli, Q. Mei, Q. LYU, Q. Chen, R. Banjade, R. E. Rudolph, R. Gabriel, R. Habacker, R. R. Delgado, R. Millière, R. Garg, R. Barnes, R. A. Saurous, R. Arakawa, R. Raymaekers, R. Frank, R. Sikand, R. Novak, R. Sitelew, R. Lebras, R. Liu, R. Jacobs, R. Zhang, R. Salakhutdinov, R. Chi, R. Lee, R. Stovall, R. Teehan, R. Yang, S. J. Singh, S. M. Mohammad, S. Anand, S. Dillavou, S. Shleifer, S. Wiseman, S. Gruetter, S. Bowman, S. S. Schoenholz, S. Han, S. Kwatra, S. A. Rous, S. Ghazarian, S. Ghosh, S. Casey, S. Bischoff, S. Gehrmann, S. Schuster, S. Sadeghi, S. S. Hamdan, S. Zhou, S. Srivastava, S. Shi, S. Singh, S. Asaadi, S. S. Gu, S. Pachchigar, S. Toshniwal, S. Upadhyay, S. Debnath, S. Shakeri, S. Thormeyer, S. Melzi, S. Reddy, S. P. Makini, S. hwan Lee, S. B. Torene, S. Hatwar, S. Dehaene, S. Divic, S. Ermon, S. R. Biderman, S. C. Lin, S. Prasad, S. T. Piantadosi, S. M. Shieber, S. Misherghi, S. Kiritchenko, S. Mishra, T. Linzen, T. Schuster, T. Li, T. Yu, T. A. Ali, T. Hashimoto, T.-L. Wu, T. Desbordes, T. Rothschild, T. Phan, T. Wang, T. Nkinyili, T. Schick, T. N. Kornev, T. Telleen-Lawton, T. Tunduny, T. Gerstenberg, T. Chang, T. Neeraj, T. Khot, T. O. Shultz, U. Shaham, V. Misra, V. Demberg, V. Nyamai, V. Raunak, V. V. Ramasesh, V. U. Prabhu, V. Padmakumar, V. Srikumar, W. Fedus, W. Saunders, W. Zhang, W. Vossen, X. Ren, X. F. Tong, X. Wu, X. Shen, Y. Yaghoobzadeh, Y. Lakretz, Y. Song, Y. Bahri, Y. J. Choi, Y. Yang, Y. Hao, Y. Chen, Y. Belinkov, Y. Hou, Y. Hou, Y. Bai, Z. Seid, Z. Xinran, Z. Zhao, Z. F. Wang, Z. J. Wang, Z. Wang, Z. Wu, S. Singh, and U. Shaham, "Beyond the imitation game: Quantifying and extrapolating the capabilities of language models," ArXiv, vol. abs/2206.04615, 2022.

[344] L. Yang, Z. Wang, Y. Wu, J. Yang, and Y. Zhang, "Towards fine-grained causal reasoning and qa," arXiv preprint arXiv:2204.07408, 2022.

[345] J. Pearl, "Invited commentary: understanding bias amplification," American journal of epidemiology, vol. 174, no. 11, pp. 1223–1227, 2011.

[346] M. L Mitchell and J. M Jolley, Research design explained, 2010.

[347] W. H. Jefferys and J. O. Berger, "Ockham's razor and bayesian analysis,"American scientist, vol. 80, no. 1, pp. 64–72, 1992.

[348] L. Cheng, R. Guo, R. Moraffah, P. Sheth, K. S. Candan, and H. Liu, "Evaluation methods and measures for causal learning algorithms," IEEE Transactions on Artificial Intelligence, 2022.

[349] D. B. Rubin, "Estimating causal effects of treatments in randomized and nonrandomized studies." Journal of educational Psychology, vol. 66, no. 5, p. 688, 1974.

[350] K. Quach, "Openai shuts down robotics team because it doesn't have enough data yet," 2021, [Online; accessed 30-May-2022]. [Online]. Available: www.theregister.com/2021/07/18/…

[351] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. P. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, "Mastering the game of go with deep neural networks and tree search," Nat., vol. 529, no. 7587, pp. 484–489, 2016. [Online]. Available: doi.org/10.1038/nat…

[352] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, "Mastering the game of go without human knowledge," Nat., vol. 550, no. 7676, pp. 354–359, 2017. [Online]. Available: doi.org/10.1038/nat…

[353] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller, "Playing atari with deep reinforcement learning,"CoRR, vol. abs/1312.5602, 2013. [Online]. Available: arxiv.org/abs/1312.56…

[354] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., "Grandmaster level in starcraft ii using multi-agent reinforcement learning," Nature, vol. 575, no. 7782, pp. 350–354, 2019.

[355] M. Komorowski, L. A. Celi, O. Badawi, A. C. Gordon, and A. A. Faisal, "The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care," Nature medicine, vol. 24, no. 11, pp. 1716–1720, 2018.

[356] M. G. Bellemare, S. Candido, P. S. Castro, J. Gong, M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang, "Autonomous navigation of stratospheric balloons using reinforcement learning," Nature, vol. 588, no. 7836, pp. 77–82, 2020.

[357] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in pytorch," 2017.

[358] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., "{TensorFlow}: a system for {Large-Scale}machine learning," in 12th USENIX symposium on operating systems design and implementation (OSDI 16), 2016, pp. 265–283.

[359] J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, "JAX: composable transformations of Python+NumPy programs," 2018. [Online]. Available: github.com/google/jax

[360] J. Heek, A. Levskaya, A. Oliver, M. Ritter, B. Rondepierre, A. Steiner, and M. van Zee, "Flax: A neural network library and ecosystem for JAX," 2020. [Online]. Available: github.com/google/flax

[361] T. Hennigan, T. Cai, T. Norman, and I. Babuschkin, "Haiku: Sonnet for JAX," 2020. [Online]. Available: github.com/deepmind/dm…

[362] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, "Transformers: State-of-the-art natural language processing," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. [Online]. Available: www.aclweb.org/anthology/2…

[363] R. Wightman, "Pytorch image models," github.com/rwightman/p…, 2019.

[364] M. Fey and J. E. Lenssen, "Fast graph representation learning with PyTorch Geometric," in ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.

[365] K. Xia, K.-Z. Lee, Y. Bengio, and E. Bareinboim, "The causal-neural connection: Expressiveness, learnability, and inference," Advances in Neural Information Processing Systems, vol. 34, 2021.

[366] I. Gulrajani and D. Lopez-Paz, "In search of lost domain generalization," in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [Online]. Available: openreview.net/forum?id=lQ…

[367] L. Zintgraf, K. Shiarlis, M. Igl, S. Schulze, Y. Gal, K. Hofmann, and S. Whiteson, "Varibad: A very good method for bayes-adaptive deep rl via metalearning," arXiv preprint arXiv:1910.08348, 2019.

[368] A. Gupta, R. Mendonca, Y. Liu, P. Abbeel, and S. Levine, "Metareinforcement learning of structured exploration strategies," Advances in neural information processing systems, vol. 31, 2018.

[369] B. Schölkopf, "Causality for machine learning," 2019.

[370] A. Feder, K. A. Keith, E. Manzoor, R. Pryzant, D. Sridhar, Z. Wood-Doughty, J. Eisenstein, J. Grimmer, R. Reichart, M. E. Roberts, B. M. Stewart, V. Veitch, and D. Yang, "Causal inference in natural language processing: Estimation, prediction, interpretation and beyond," 2021.

[371] T. McCoy, E. Pavlick, and T. Linzen, "Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 3428–3448. [Online]. Available: aclanthology.org/P19-1334

[372] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi, "A survey of methods for explaining black box models,"ACM Comput. Surv., vol. 51, no. 5, aug 2018. [Online]. Available: doi.org/10.1145/323…

[373] L. Cheng, A. Mosallanezhad, P. Sheth, and H. Liu, "Causal learning for socially responsible ai," 2021. [Online]. Available: arxiv.org/abs/2104.12…

[374] E. H. Kennedy, "Optimal doubly robust estimation of heterogeneous causal effects," arXiv preprint arXiv:2004.14497, 2020.

[375] S. R. Künzel, J. S. Sekhon, P. J. Bickel, and B. Yu, "Metalearners for estimating heterogeneous treatment effects using machine learning," Proceedings of the National Academy of Sciences, vol. 116, no. 10, pp. 4156–4165, 2019.

[376] U. Shalit, F. D. Johansson, and D. Sontag, "Estimating individual treatment effect: generalization bounds and algorithms," in International Conference on Machine Learning. PMLR, 2017, pp. 3076–3085.

[377] C. Shi, D. Blei, and V. Veitch, "Adapting neural networks for the estimation of treatment effects," in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019.

[378] A. Caron, I. Manolopoulou, and G. Baio, "Estimating individual treatment effects using non-parametric regression models: a review," arXiv preprint arXiv:2009.06472, 2020.

[379] X. Nie and S. Wager, "Quasi-oracle estimation of heterogeneous treatment effects," Biometrika, 09 2020.

[380] A. Curth and M. van der Schaar, "Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms," in The 24thInternational Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, ser. Proceedings of Machine Learning Research, A. Banerjee and K. Fukumizu, Eds., vol. 130. PMLR, 2021, pp. 1810–1818.

[381] L. Nie, M. Ye, qiang liu, and D. Nicolae, "{VCN}et and functional targeted regularization for learning causal effects of continuous treatments," in International Conference on Learning Representations, 2021.

[382] Y.-F. Zhang, H. Zhang, Z. C. Lipton, L. E. Li, and E. P. Xing, "Exploring transformer backbones for heterogeneous treatment effect estimation," 2022. [Online]. Available: arxiv.org/abs/2202.01…

[383] C. F. Manski, "Nonparametric bounds on treatment effects," The American Economic Review, vol. 80, no. 2, pp. 319–323, 1990.

[384] ——, Partial identification of probability distributions. Springer, 2003, vol. 5.

[385] K. Imai, L. Keele, and T. Yamamoto, "Identification, inference and sensitivity analysis for causal mediation effects," Statistical science, vol. 25, no. 1, pp. 51–71, 2010.

[386] C. Cinelli, D. Kumor, B. Chen, J. Pearl, and E. Bareinboim, "Sensitivity analysis of linear structural causal models," in International conference on machine learning. PMLR, 2019, pp. 1252–1261.

[387] M. Baiocchi, J. Cheng, and D. S. Small, "Instrumental variable methods for causal inference," Statistics in medicine, vol. 33, no. 13, pp. 2297–2340, 2014.

[388] W. Miao, Z. Geng, and E. J. Tchetgen Tchetgen, "Identifying causal effects with proxy variables of an unmeasured confounder," Biometrika, vol. 105, no. 4, pp. 987–993, 2018.

[389] C. Louizos, U. Shalit, J. M. Mooij, D. Sontag, R. Zemel, and M. Welling, "Causal effect inference with deep latent-variable models," in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017.

[390] A. Mastouri, Y. Zhu, L. Gultchin, A. Korba, R. Silva, M. Kusner, A. Gretton, and K. Muandet, "Proximal causal learning with kernels: Two-stage estimation and moment restriction," in International Conference on Machine Learning. PMLR, 2021, pp. 7512–7523.

[391] L. Xu, H. Kanagawa, and A. Gretton, "Deep proxy causal learning and its application to confounded bandit policy evaluation," Advances in Neural Information Processing Systems, vol. 34, 2021.

[392] J. H. Stock and F. Trebbi, "Retrospectives: Who invented instrumental variable regression?" Journal of Economic Perspectives, vol. 17, no. 3, pp. 177–194, 2003.

[393] R. Singh, M. Sahani, and A. Gretton, "Kernel instrumental variable regression," in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. B. Fox, and R. Garnett, Eds., 2019, pp. 4595–4607. [Online]. Available: proceedings.neurips. cc/paper/2019/hash/17b3c7061788dbe82de5abe9f6fe22b3-Abstract.html

[394] A. Bennett, N. Kallus, and T. Schnabel, "Deep generalized method of moments for instrumental variable analysis," Advances in neural information processing systems, vol. 32, 2019.

[395] K. Muandet, A. Mehrjou, S. K. Lee, and A. Raj, "Dual instrumental variable regression," Advances in Neural Information Processing Systems, vol. 33, pp. 2710–2721, 2020.

[396] N. Dikkala, G. Lewis, L. Mackey, and V. Syrgkanis, "Minimax estimation of conditional moment models," Advances in Neural Information Processing Systems, vol. 33, pp. 12 248–12 262, 2020.

[397] D. M. Chickering, Learning Bayesian Networks is NP-Complete. New York, NY: Springer New York, 1996, pp. 121–130.

[398] A. P. Singh and A. W. Moore, Finding optimal Bayesian networks by dynamic programming. Citeseer, 2005.

[399] J. Xiang and S. Kim, "A∗ lasso for learning a sparse bayesian network structure for continuous variables," in Advances in Neural Information Processing Systems, vol. 26, 2013.

[400] J. Cussens, "Bayesian network learning with cutting planes," in Uncertainty in Artificial Intelligence, 2011.

[401] C. Squires and C. Uhler, "Causal structure learning: a combinatorial perspective," 2022. [Online]. Available: arxiv.org/abs/2206.01…

[402] M. Scanagatta, C. P. de Campos, G. Corani, and M. Zaffalon, "Learning bayesian networks with thousands of variables," in Advances in Neural Information Processing Systems, vol. 28, 2015.

[403] B. Aragam and Q. Zhou, "Concave penalized estimation of sparse gaussian bayesian networks," The Journal of Machine Learning Research, vol. 16, 2015.

[404] J. D. Ramsey, M. Glymour, R. Sanchez-Romero, and C. Glymour, "A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images," International Journal of Data Science and Analytics, vol. 3, 2017.

[405] X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, "Dags with NO TEARS: continuous optimization for structure learning," in Advances in Neural Information Processing Systems, 2018.

[406] Y. Yu, J. Chen, T. Gao, and M. Yu, "DAG-GNN: DAG structure learning with graph neural networks," in ICML, vol. 97, 2019.

[407] X. Zheng, C. Dan, B. Aragam, P. Ravikumar, and E. Xing, "Learning sparse nonparametric dags," in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 3414–3425.

[408] I. Ng, A. Ghassami, and K. Zhang, "On the role of sparsity and DAG constraints for learning linear dags," in NeurIPS, 2020.

[409] P. Brouillard, S. Lachapelle, A. Lacoste, S. Lacoste-Julien, and A. Drouin, "Differentiable causal discovery from interventional data," in NeurIPS, 2020.

[410] Y. He, P. Cui, Z. Shen, R. Xu, F. Liu, and Y. Jiang, "DARING: differentiable causal discovery with residual independence," in KDD, 2021.

[411] P. Lippe, T. Cohen, and E. Gavves, "Efficient neural causal discovery without acyclicity constraints," in International Conference on Learning Representations, 2022. [Online]. Available: openreview.net/forum?id=eY…

[412] N. Friedman and D. Koller, "Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks," Machine learning, vol. 50, 2003.

[413] M. Gao, Y. Ding, and B. Aragam, "A polynomial-time algorithm for learning nonparametric causal graphs," in Advances in Neural Information Processing Systems, 2020.

[414] C. Cundy, A. Grover, and S. Ermon, "BCD nets: Scalable variational approaches for bayesian causal discovery," in Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021.

[415] V. Zantedeschi, J. Kaddour, L. Franceschi, M. Kusner, and V. Niculae, "DAG learning on the permutahedron," in ICLR2022 Workshop on the Elements of Reasoning: Objects, Structure and Causality, 2022.

[416] M. J. Vowels, N. C. Camgoz, and R. Bowden, "D'ya like dags? a survey on structure learning and causal discovery," arXiv preprint arXiv:2103.02582, 2021. ·